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Abstract 

When a beneficial mutation occurs in a population, the new, favored allele may spread 
to the entire population. This process is known as a selective sweep. Suppose we sample 
n individuals at the end of a selective sweep. If we focus on a site on the chromosome 
that is close to the location of the beneficial mutation, then many of the lineages will likely be 
descended from the individual that had the beneficial mutation, while others will be descended 
from a different individual because of recombination between the two sites. We introduce two 
approximations for the effect of a selective sweep. The first one is simple but not very accurate: 
flip n independent coins with probability p of heads and say that the lineages whose coins 
come up heads are those that are descended from the individual with the beneficial mutation. 
A second approximation, which is related to Kingman's paintbox construction, replaces the 
coin flips by integer-valued random variables and leads to very accurate results. 
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1 Introduction 



A classical continuous-time model for a population with overlapping generations is the Moran 
model, which was introduced by Moran (1958). Thinking of N diploid individuals, we assume the 
population size is fixed at 2N. However under the assumption that each individual is a random 
union of gametes, the dynamics are the same as for a population of 2N haploid individuals, so 
we will do our computation for that case. In the simplest version of the Moran model, each 
individual independently lives for a time that is exponentially distributed with mean 1 and then 
is replaced by a new individual. The parent of the new individual is chosen at random from the 
2N individuals, including the individual being replaced. 

Here we will consider a variation of the Moran model that involves two loci, one subject to 
natural selection, the other neutral, and with a probability r in each generation of recombination 
between the two loci. To begin to explain the last sentence, we assume that at the selected locus 
there are two alleles, B and b, and that the relative fitnesses of the two alleles are 1 and 1 — s. The 
population then evolves with the same rules as before, except that a replacement of an individual 
with a B allele by an individual with a b allele is rejected with probability s. Consequently, if at 
some time there are k individuals with the B allele and 2N — k with the b allele, then the rate of 
transitions that increase the number of B individuals from A; to A; + 1 is k{2N — k)/{2N), but the 
rate of transitions that reduce the number of B individuals to A; — 1 is k{2N — k){l — s)/{2N). 
See chapter 3 of Durrett (2002) for a summary of some work with this model. 

We assume that the process starts at time zero with 2N — 1 individuals having the b allele and 
one individual having the advantageous B allele. We think of the individual with the B allele as 
having had a beneficial mutation at time zero. There is a positive probability that eventually all 
2N individuals will have the favorable allele. When this happens, we say that a selective sweep 
occurs, because the favorable allele has swept through the entire population. 

If we assume that the entire chromosome containing the selected locus is passed down from 
one generation to the next, as is the case for the Y chromosome or mitochondrial DNA, then 
all 2A^ chromosomes at the end of the selective sweep will have come from the one individual 
that had the beneficial mutation at the beginning of the sweep. However, non-sex chromosomes 
in diploid individuals are typically not an identical copy of one of their parents' chromosomes. 
Instead, because of a process called recombination, each chromosome that an individual inherits 
consists of pieces of each of a parent's two chromosomes. In this case, if we are interested in 
the origin of a second neutrally evolving locus on the chromosome and a selective sweep occurs 
because of an advantageous mutation at a nearby site, then some of the lineages will be traced 
back to the chromosome that had the favorable allele at the beginning of the sweep but other 
lineages will be traced back to different individuals because of recombination between the neutral 
and selected loci. When a lineage can be traced back to an individual other than the one with 
the beneficial mutation, we say that the lineage escapes from the selective sweep. 

The combined effects of recombination and selective sweeps have been studied extensively. 
Maynard Smith and Haigh (1974) observed that selective sweeps can alter the frequencies of 
alleles at sites nearby the site at which the selective sweep occurred. They referred to this as 
the "hitchhiking effect". They considered a situation with a neutral locus with alleles A and a 
and a second locus where allele B has a fitness of 1 + s relative to b. Suppose po is the initial 
frequency of the B allele, and Qn and i?„ are the frequencies in generation n of the A allele on 
chromosomes containing B and b respectively, li Qq = (i.e., the advantageous mutation arises 
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on a chromosome with the a allele) and the recombination probability in each generation is r, 
Maynard Smith and Haigh (1974) showed (see (8) on page 25) that the frequency of the A allele 
after the selective sweep is reduced from Rq to 
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In the calculation of Maynard Smith and Haigh, the number of individuals with the B locus 
grows deterministically. Kaplan, Hudson, and Langley (1989) used a model involving an initial 
phase in which the number of i?'s is a supercritical branching process, a middle deterministic 
piece where the fraction p of B^s follows the logistic differential equation 



and a final random piece where the number of 6's follows a subcritical branching process. This 
process is too difficult to study analytically so they resorted to simulation. 

Stephan, Wiehe, and Lenz (1992) further simplified this approach by ignoring the random 
first and third phases and modeling the change in the frequency of B's by the logistic differential 
equation (|l.lj) . which has solution 



This approach has been popular with biologists in simulation studies (see, for example, Simonsen, 
Churchill, and Aquadro (1995) and Przeworski (2002)). However, as results in Barton (1998) and 
Durrett and Schweinsberg (2004a) show, this can introduce substantial errors, so rather than 
using this approximation for our analysis, we will consider a modification of the Moran model 
that allows for recombination as well as beneficial mutations. 

We consider two sites on each chromosome. At one site, each of the 2A^ chromosomes has 
either the advantageous B allele or a 6 allele. Our interest, however, is in the genealogy at another 
neutral site, at which all alleles have the same fitness. As before, we assume that each individual 
lives for an exponential time with mean 1 and is replaced by a new individual whose parent is 
chosen at random from the population, except that we disregard disadvantageous replacements 
of a S chromosome by a 6 chromosome with probability s. We will also now assume that when a 
new individual is born, it inherits alleles at both sites from the same individual with probability 
1 — r. With probability r, there is recombination between the two sites, and the individual 
inherits the allele at the neutral site from its parent's other chromosome. Since a parent's two 
chromosomes are considered to be two distinct individuals in the population, we model this by 
saying that the new individual inherits the two alleles from two ancestors chosen independently 
at random from the 2N individuals in the population. 

Suppose we sample n chromosomes at the end of a selective sweep and follow their ancestral 
lines back until the beginning of the sweep. We will describe the genealogy of the sample by 
a marked partition of {1, ... which we define to be a partition of {1, . . . ,n} in which one 
block of the partition may be designed as a "marked" block. We define the marked partition Q of 
{1, . . . , n} as follows. We say that two integers i and j are in the same block of G, denoted i ~0 j, 
if and only if the alleles at the neutral site on the ith and jth chromosomes in the sample have 
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the same ancestor at the begmning of the sweep. Thus, if we are following the lineages associated 
with the allele at the neutral site, we have i ~0 j if and only if the ith and jth lineages coalesce 
during the selective sweep. We also "mark" the block of Q containing the integers i for which the 
ith individual is descended from the individual that had the beneficial mutation at the beginning 
of the sweep. Thus, to understand how a selective sweep affects the genealogy of a sample of size 
n, we need to understand the distribution of the random marked partition G. 

In this paper, we study two approximations to the distribution of G. The approximations were 
introduced, and studied by simulation, in Durrett and Schweinsberg (2004a). Here we provide 
precise bounds on the error in the approximations. The idea behind the first approximation is 
that a large number of lineages will inherit their allele at the neutral site from the individual that 
had the beneficial mutation at the beginning of the sweep, and the corresponding integers i will 
be in the marked block of G. With high probability, the lineages that escape the selective sweep 
do not coalesce with one another, so the corresponding integers are in singleton blocks of G. 

Before stating the first approximation precisely, we need a definition. Let p G [0,1]. Let 
^1, . . . , ^„ be independent random variables such that, for i = 1, . . . , n, we have = 1) = p 
and P{£,i = 0) = 1 — p. We call the random marked partition of {1, . . . , n} such that one marked 
block consists of {i G {1, . . . ,n} : = 1} and the remaining blocks are singletons a p-partition 
of {1, ... , n}. Let Qp denote the distribution of a p-partition of {1, ... , n}. 

Theorem II . II b elow shows that the distribution of G can be approximated by the distribution 
of a p-partition. For this result, and throughout the rest of the paper, we assume that the selective 
advantage s is a fixed constant that does not depend on the population size A^. However, the 
recombination probability r is allowed to depend on N, even though we have not recorded this 
dependence in the notation. We will assume throughout the paper that r < Co/(log A^) for some 
positive constant Cq. We denote by Vn the set of marked partitions of {1, . . . , n}. 

Theorem 1.1. Fix n € N. Let a = r\og{2N)/s. Let p = e~". Then, there exists a positive 
constant C such that |P(G = vr) — (5p(vr)| < C/(log A^) for all N and all vr G Vn- 

In this theorem, and throughout the rest of the paper, C denotes a positive constant that may 
depend on s but does not depend on r or N . The value of C may change from line to line. 

A consequence of Theorem 11.11 is that if limTv^oo log(2A^)/s = a for some a G (0,oo) and 
p = e~", then the distribution of G converges to Qp as N ^ oo. However, the rate of convergence 
that the theorem gives is rather slow, and simulation results of Barton (1998) and Durrett and 
Schweinsberg (2004a) show that the approximation is not very accurate for realistic values of 
A^. Consequently, it is necessary to look for a better approximation. Theorem 11.21 below gives 
an approximation with an error term that is of order l/(log A^)^ rather than 1/logA^. It follows 
from the improved approximation that the error in Theorem 1 1.1 1 is actually of order 1/log A^. 

The motivation for the second approximation comes from the observation that, at the be- 
ginning of the selective sweep, the number of i?'s can be approximated by a continuous-time 
branching process in which each individual gives birth at rate 1 and dies at rate 1 — s. Some in- 
dividials in this supercritical branching process will have an infinite line of descent, meaning that 
they have descendants alive in the population at all future times. As we will show later, the indi- 
viduals with an infinite line of descent can be approximated by a Yule process, a continuous-time 
branching process in which each individual splits into two at a constant rate s. Since our sample, 
taken at the end of the selective sweep, comes from lineages that have survived a long time, we 
can get a good approximation to the genealogy by considering only individuals with an infinite 
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line of descent. We will also show that, during the time when there are exactly k > 2 lineages 
with an infinite line of descent, the expected number of recombinations along these lineages is 
r/s. For simplicity, we assume that the number of such recombinations is always either or 1. 
Such a recombination causes individuals descended from the lineage with the recombination to 
be traced back to an ancestor at time zero different from descendants of the other k — 1 lineages 
(and therefore to belong to a different block of 0). Well-known facts about the Yule process (see 
e.g. Joyce and Tavare, 1987) imply that when there are k lineages, the fraction of individuals at 
the end of the sweep that are descendants of a given lineage has approximately a beta(l, A; — 1) 
distribution. Furthermore, we will show that with probability r(l — s)/(r(l — s) + s), there is a 
recombination when there is only one individual with an infinite line of descent, in which case 
none of the sampled lineages will get traced back to the individual with the B allele at time zero. 

These observations motivate the definition of a class of marked partitions of {1, . . . , n}, which 
we will use to approximate the distribution of O. The construction resembles the paintbox con- 
struction of exchangeable random partitions due to Kingman (1978). To start the construction, 
assume < r < s, and let L be a positive integer. Then let {Wk)j^^2 independent random 
variables such that Wk has a Beta distribution with parameters 1 and k — 1. Let {Ck)k=2 ^ 
sequence of independent random variables such that -P(Cfe = 1) = ^/•s and P{(k = 0) = 1 — r/s 
for all k. As the reader might guess from the probabilities, Cfc = 1 corresponds to a recombination 
when there are k lineages with an infinite line of descent. For k = 2,3, ... ,L, let = CkW^, 
and let Yfc = Vfc J7^^^_|_]^(l — Vj) be the fraction of individuals carried away by recombination. 

Let Yi = nj'=2(l - ^otc that Ylk=i^k = 1- Finally, let Qr,s,L be the distribution of the 
random marked partition 11 of { 1 , . . . , n} constructed in the following way. Define random vari- 
ables Zi,... ,Zn to be conditionally independent given (1^)^=1 such that for z = 1, . . . , n and 
J = 1, . . . , L, we have P{Zi = j|(^)j^=i) = ^j- Here the integers i such that Zi = k correspond 
to lineages that recombine when there are k members of the B population with an infinite line of 
descent. Then define 11 such that i ~n j if and only if Z^ = Zj. Independently of (Zj)"^j^, we mark 
the block {i : Z^ = 1} with probability s/{r{l — s) + s) and, with probability r{l — s)/{r{l — s)+s), 
we mark no block. When the block is marked, the integers i such that Zj = 1 correspond to the 
lineages that do not recombine and therefore can be traced back to the individual that had the 
beneficial mutation at time zero; otherwise, they correspond to the lineages that recombine when 
there is only one member of the B population with an infinite line of descent. 

We are now ready to state our main approximation theorem, which says that the distribution 
of O can be approximated well by the distribution Qr,s,L, where L = [2A^sJ, and [mj denotes 
the greatest integer less than or equal to m. The choice of L comes from the fact that in a 
continuous-time branching process with births at rate 1 and deaths at rate 1 — s, each individual 
has an infinite line of descent with probability s. Therefore, the number of such individuals at 
the end of the selective sweep is approximately L. 

Theorem 1.2. Fix n G N and let L = L2iVsJ . Then, there exists a positive constant C such that 
for all N and all ir & Vn 

\P{@ = n)-Qr,s,Li^)\<C/{logNf 

Consider for concreteness A'^ = 10, 000, a number commonly used for the "effective size" of 
the human population. To explain the term in quotes, we note that although there are now 
6 billion humans, our exponential population growth is fairly recent, so for many measures of 
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genetic variability the human population is the same as a homogeneously mixing population of 
constant size 10,000. When N = 10,000, logA^ = 9.214 and (logiV)^ = 84.8, so Theorem O 
may not appear at first glance to be a big improvement. Two concrete examples however show 
that the improvement is dramatic. In each case = 10^ and s = 0.1. More extensive simulation 
results comparing the two approximations are given in Durrett and Schweinsberg (2004a). 
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Here pinb is the probability that a lineage escapes the selective sweep. The remaining three 
columns pertain to two lineages: p2inb is the probability that two lineages both escape the sweep 
but do not coalesce, p2cinb is the probability both lineages escape but coalesce along the way, and 
plBlb is the probability one lineage escapes the sweep but the other does not. The remaining 
possibility is that neither lineage escapes the sweep, but this probability can be computed by 
subtracting the sum of the other three probabilities from one. The first row in each group gives 
the probabilities obtained from the approximation in Theorem ll.il and the third row gives the 
probabilities obtained from the approximation in Theorem 1 1.21 The second row gives the average 
of 10,000 simulation runs of the Moran model described earlier. The values of the recombination 
rate r were chosen in the two examples to make the approximations to pinb given by Theorem 
II. II equal to 0.1 and 0.4 respectively. It is easy to see from the table that the approximation from 
Theorem 11.21 is substantially more accurate. In particular, note that in the approximation given 
by Theorem ll.il two lineages never coalesce unless both can be traced back to the individual with 
the beneficial mutation. Consequently, p2cinb would be zero if this approximation were correct. 
However, in simulations, a significant percentage of pairs of lineages both coalesced and escaped 
from the sweep, and this probability is approximated very well by Theorem ll.2l in both examples. 

The results in this paper are a first step in studying situations in which, as proposed by Gille- 
spie (2000), selective sweeps occur at times of a Poisson process in a single locus or distributed 
along a chromosome at different distances from the neutral locus at which data have been col- 
lected. It is well-known that in the Moran model when there are no advantageous mutations, if 
we sample n individuals and follow their ancestors backwards in time, then when time is sped up 
by 2N, we get the coalescent process introduced by Kingman (1982). It is known (see Durrett, 
2002) that selective sweeps require an average amount of time (2/s)logA^, so when time is sped 
up by 2N, the selective sweep occurs almost instantaneously. Durrett and Schweinsberg (2004b) 
show that Theorem 11.11 implies that if advantageous mutations occur at times of a Poisson process 
then the ancestral processes converge as ^ oo to a coalescent with multiple collisions of the 
type introduced by Pitman (1999) and Sagitov (1999). At times of a Poisson process, multiple 
lineages coalesce simultaneously into one. The more accurate approximation in Theorem 11.21 
suggests that a better approximation to the ancestral process can be given by a coalescent with 
simultaneous multiple collisions. These coalescent processes were studied by Mohle and Sagitov 
(2001) and Schweinsberg (2000). 
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Finally, it is important to emphasize that the results in this paper are for the case of "strong 
selection", where the selective advantage s is 0(1). There has also been considerable interest in 
weak selection, where Ns is assumed to converge to a limit as ^ oo, which means s is 0{1/N). 
In this case, there is a diffusion limit as ^ oo. For work in this direction that incorporates the 
effect of recombination, see Donnelly and Kurtz (1999) and Barton, Etheridge, and Sturm (2004). 
Recently, Etheridge, Pfaffelhuber, and Wakolbinger (2005) have shown that many of the results 
in this paper carry over to the diffusion setting. They assume that Ns — > a as — > oo, so that 
they can work with a diffusion limit, and then obtain an approximation to the distribution of the 
ancestral partition G which has an error of order l/(loga)^ as a — > oo, by using approximations 
to the genealogy similar to those used in the present paper. 

2 Overview of the Proofs 

Since the proofs of Theorems 11.11 and 11.21 are rather long, we outline the proofs in this section. 
A precise definition of the genealogy is given in subsection 2.1. The proof of Theorem 1.1 
is outlined in subsection 2.2. In subsection 2.3, we describe the coupling with a supercritical 
branching process and outline the proof of Theorem 1.2. 

2.1 Precise definition of the genealogy 

We now define more precisely our model of a selective sweep. We construct a process M = 
(Mt)^Q. The vector Mt = (M((l), . . . ,Mt{2N)) contains the information about the population 
at the time of the tth proposed replacement, and Mt{i) = {A^{i), . . . , Al~^{i),Bt{i)) contains the 
information about the ancestors of the ith individual at time t. For < u < t — 1, Af(i) is the 
individual at time u that is the ancestor of the ith individual at time t, when we consider the 
genealogy at the neutral locus. The final coordinate Bt{i) = 1 if the ith. individual at time t has 
the B allele, and Bt{i) = if this individual has the b allele. Note that this is a discrete-time 
process, but one can easily recover the continuous-time description by replacing discrete time 
steps with independent holding times, each having an exponential distribution with mean 1/2N. 

At time zero, only one of the chromosomes will have the B allele. We define a random variable 
U, which is uniform on the set {1, . . . , 2A^}, and we let Bq(U) = 1 and i?o(0 — i ^ U. We 
now define a collection of independent random variables Itj for t € N and j G {1, . . . , 5}. For 
j £ {1, 2, 3}, the random variable Itj is uniform on {1, ... , 2A^}. 

• It^i will be the individual that dies at time t. 

• It^2 will be the parent of the new individual at time t. 

• It^3 will be the other parent from whom the new chromosome will inherit its allele at the 
neutral locus if there is recombination. 

• It^4 will be an indicator for whether a proposed disadvantageous change will be rejected, so 
p'(/^ 4 = 1) = s and P(/t,4 = 0) = 1 - s. 

• /t^5 will determine whether there is recombination at time t, so P{It.5 = 1) = r and 
P'(li,5 = 0) = 1 - r. 



7 



b population 




• — 














A 

' 






^^^^^^^^ 


B population 



j 



TJ R{i) G{i,j) 



Figure 1. A picture to explain our notation. The lineages jump around as we move backwards 
in time, but for simplicity we have only indicated the recombination events. Here as we work 
backwards in time i and j coalesce and then recombine into the b population. Proposition 2.4 
shows that this event has probability at most C/\ogN. Proposition 2.1 estimates the probability 
of two recobminations as shown in lineage k. 

Using these random variables we can construct the process in the obvious way. Refer to 
Figure 1 for help with the notation. 

1. If Bt-i{It,i) = 1, Bt-i{It^2) = 0, and It^A = 1, then the population will be the same at time 
t as at time t—1 because the proposed replacement oi a B chromosome by a 6 chromosome 
is rejected. In this case, for alH = 1, . . . , 2N we define Bt{i) = A\~^{i) = i, and 
Af{z) = ioTu = 0,..., t- 2. 

2. If we are not in the previous case and /^^s = 0, then there is no recombination at time t. 
So, the individual It^i dies, and the new individual gets its alleles at both sites from It^2- 

For i / It^i, define Bt{i) = Bt-i{i), AJ'(i) = A1_^{i) for u = 0, . . . , t - 2, and A\-^{i) = i. 
Let Bt{It,i) = Bt-i{It,2), A^{It,i) = A"-iUt,2) for « = 0, ... ,t - 2, and = It,2. 

3. If we are not in either of the previous two cases, then there is recombination at time t. 
This means that the new individual labeled It^i gets a B or b allele from It^2 but gets its 
allele at the neutral locus from If^^. For i ^ It^i, define -B^(^) = Bt-i{i), Af{i) = A^_^{i) 
for n = 0, ... ,t - 2, and Al'^i) = i. Let Bt(lt,i) = Bt-i{It,2), AUh,!) = ^tiUt.s) for 

= 0, . . . , t - 2, and 4-'(/t,i) = It,3. 

It will also be useful to have notation for the number of individuals with the favorable allele. 
For nonnegative integers t, define Xt = #{? : Bt{i) = 1}, where #S denotes the cardinality of 
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the set S. For J = 1, 2, ... , 2N, let tj = inf{t ■ Xt > J} be the first time at which the number of 
B^s in the population reaches J. Let r = inf{t : Xt G {0, 2A^}} be the time at which the B allele 
becomes fixed in the population (in which case Xr = 2N) or disappears (in which case Xj- = 0). 
Since our main interest is in studying a selective sweep, P' and E' will denote probabilities 
and expectations under the unconditional law of M, and P and E will denote probabilities and 
expectations under the conditional law of M given Xr = 2N. Likewise, Var, and Gov will always 
refer to conditional variances and covariances given X^ = 2N. 

To sample n individuals from the population at the time r when the selective sweep ends, 
we may take the individuals 1, . . . , n because the distribution of genealogy of n individuals does 
not depend on which n individuals are chosen. Therefore, we can define G to be the random 
marked partition of {1, . . . ,n} such that i ~e j if and only if the ith and jth individuals at 
time r get their allele at the neutral site from the same ancestor at time 0, with the marked 
block corresponding to the individuals descended from the individual U, which had the beneficial 
mutation at time zero. More formally, we have i ~e j if and only if A^(i) = A^ij) with the 
marked block being {i : A^{i) = U} or, equivalently, {i : BQ{A^{i)) = 1}. 

2.2 The first approximation 

Recall that Theorem 1.1 says that we can approximate by flipping independent coins for each 
lineage, which come up heads with probability p, to determine which lineages fail to escape the 
selective sweep. These lineages are then in one block of the partition, because they are descended 
from the ancestor with the beneficial mutation at time zero, while the other lineages do not 
coalesce and correspond to singleton blocks of the partition. 

The first step in establishing this picture is to calculate the probability that one lineage 
escapes the selective sweep. In the notation above, we need to find P{BQ{A^{i)) = 0). Define 
R(i) = sup{t > : 5t(^t(i)) = 0}, where sup0 = — oo. If we work backwards in time, R{i) is 
the first moment that the lineage of the neutral locus resides in the b population. Note that it 
is possible to have R{i) > and i?o(^r(^)) = 1 if a lineage is affected by two recombinations, 
one taking it from the B population to the b population, and another taking it back into the B 
population. The next result shows that the probability of this is small. 

Proposition 2.1. P(5t(A* (i)) = 1 for some t < R{i)) < C/{logNf. 

Proposition 12. II implies that in the proofs of Theorems 1 and 2, the probability that a lineage 
escapes the selective sweep can be approximated by P{R{i) > 0). It will also be useful to have an 
approximation of P{R{i) > rj) for J > 1, which is the probability that a given lineage escapes 
into the b population after the time when the number of i?'s in the population reaches J. The 
next result gives such an approximation. 

Proposition 2.2. If qj = I - exp (^-^ Yjk=j+i i) ^^^n 

Propositions 12. ll and 12.2] will be proved in Section 3. 

The next step is to consider two lineages. We now need to consider not only recombination 
but also the possibility that the lineages may coalesce, meaning that the alleles at the neutral 
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site on the two lineages are descended from the same ancestor at the beginning of the sweep. Let 
G{i,j) be the time that the ith and jth hneages coalesce. More precisely, we define G{i,j) = 
sup{i : A^^{i) = A^^{j)} with sup0 = — oo. Our first result regarding coalescence shows that it 
is unlikely for two lineages to coalesce at a given time unless both alleles at the neutral site are 
descended from a chromosome with the B allele at that time. 

Proposition 2.3. P{G{i,i) > and = 0) < C{logN)/N. 

Next, we bound the probability that, if we trace two lineages back through the selective sweep, 
the lineages coalesce and then escape from the sweep. 

Proposition 2.4. P(0 < R{i) < G{i,j)) < C/(logiV). 

Note that Proposition 12.31 says that, with high probability, only lineages in the B population 
merge, while ProDosition l2.4l savs that, in the first-order approximation, lineages that have merged 
do not escape into the b population. Together, these results will justify the approximation of G 
by a random partition in which the only non-singleton block corresponds to lineages that fail 
to escape the selective sweep. The next result bounds the probability that two lineages coalesce 
after time tj. 

Proposition 2.5. Let C >0. If J < G'N/{logN), then P{G{i,j) > rj) < G/J. 

We prove Propositions 12.31 12.41 and 12.51 in Section 4. 

We now consider n lineages. To prove Theorem 11.11 we will need to show that the events 
{R(X) > 0}, . . . , {R{n) > 0} are approximately independent. Let Kf = #{i S {1, . . . , n} : R{i) > 
t}. If the events that the n lineages escape the selective sweep after time t are approximately 
independent, then Kt should have approximately a binomial distribution. The following proposi- 
tion, which we prove in Section 5, provides a binomial approximation to the distribution of Kt-j. 
Since ri = 0, the J = 1 case will be used in the proof of Theorem ll.il while the general case will 
help to prepare us for the proof of Theorem 11.21 

Proposition 2.6. Define qj as in Provosition \2JA If J < C'N/{logN), then 
P{K,j=d)-(^''^qj{l-qj) 

Proof of Theorem M.ll Define a new partition 0' of {1, . . . , n} such that i ~0/ j if and only if 
R{i) = R{i) = —oo. We mark the block of B' consisting of {i : R{i) = — oo}. In words, only the 
lineages that recombine and hence stay in the B population are trapped by the sweep. To do 
this we observe 

• Proposition 2.1 implies that the probaility of two recombinations affecting a lineage can be 
ignored. 

• Proposition 2.3 says that we can ignore coalescence in the b population. 

• Propostion 2.4 says that the probability two lineages coalesce and then escape has small 
probaility. 



-^"^"|i^'7}+(i^ /or<i = 0,l,...,n. 
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The results above imply that P{Q ^ Q') < C/{logN). Therefore, to prove Theorem 11.11 it 
suffices to show that \P{Q' = vr) — Qp{ir)\ < C/{logN) for all marked partitions vr of {1, . . . ,n}. 
It follows from Proposition 12.61 with J = 1 and the exchangeability of O' that \P{Q' = vr) — 
^ C/{logN) for all vr G Vn- Using the definition of qi and |^e~^| < 1 for a; > gives 



\{^-qi)-p\ 

and the theorem follows. 



. 2N . X 2N 

^ k=2 ^ ^ ^ k=2 



< 



c 



logiV' 



□ 



2.3 Branching process coupling and the second approximation 

We now work towards improving the approximation to the distribution of @ so that we can prove 
Theorem O To do this, we will break the selective sweep into two stages. Let J = [(logA^)"J, 
where a > 4 is a fixed constant. We will consider separately the time intervals [0,rj) and [Tj,r]. 

Part 1. G « Gi « Q2. 

We first establish that we can ignore coalescence involving a lineage that escapes the sweep 
after time rj. Define a random marked partition Gi of {1, . . . ,n} such that i ~ei j if and only 
if R{i) < Tj, R{j) < Tj, and A^{i) = Mark the block of Gi consisting of {i : R{i) < 

Tj and ii?o(^r(0) = Note that Gi = G unless, for some i and j, we have R{i) > tj and 
either i ~e J or Bo{A^^{i)) = 1. It follows from Propositions EH ESI andEISlthat P(G / Gi) < 
C/(logA^)^. Thus, we may now work with Gi. 

The next step is to approximate the distribution of Gi. Let Kt = {i £ {I, . . . ,n} : R{i) > t}, 
as defined before the statement of Proposition 12.61 Define m = n — ij^K^-j to be the number 
of lineages in the B population at time rj. Proposition 12.51 shows that lineages are unlikely to 
coalesce in [rj, r]. Relabel the lineages using an arbitrary bijective function / from {1, . . . , n}\KT-j 
to {1, ... , m}. 

To describe the first stage of the selective sweep precisely, we define, for each m < J, a 
new marked partition of {1, ... , m}. Let cTm be a random injective map from {1, . . . , m} to 
{i : Brjii) = 1} such that all (J)m = {J){J — 1) ... (J — m + 1) maps are equally likely. Thus, 
(Tm(l)j • • • , o"m("i) is a random sample from the J individuals with the B allele at time tj. Then 
define such that that i j if and only if A^^{am{i)) = ^rji^mU))- This means i and 
j are in the same block of if and only if the am{i)th and (Tmij)th individuals at time rj 
inherited their allele at the neutral locus from the same individual at the beginning of the sweep. 
The block {i : Bo{A'^j{am{i))) = 1} is marked. 

Define G2 to be the marked partition of {1, . . . , n} such that i ~02 j if and only if R{i) < rj, 
R{j) < Tj, and f{i) /(j)- Let the marked block of G2 consist of all i such that R{i) < tj 
and f{i) is in the marked block of ^'m- To compare Gi and G2, note that /(i) ~<i'„ f{j) if 
and only if A^j{am{f{t))) = A^j{am{f{j)))- On the other hand, A^^{i) = A^^{j) if and only if 
= A^jiAmj)). For i 7^ j, we have P{AlJ{i) = A^-^(j)) < C/(log7V)^ by Proposition 
12.51 By the strong Markov property, the genealogy of the process up to time tj is independent of 
K-rj. From these observations and the exchangeability of the model, it follows that for all vr G Vn, 
we have |P(Gi = vr) = P(G2 = vr)| < C/{logN)^. 

Part 2. ~ Qr,s,lJs\{'^) 
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Our next step is to understand the distribution of '^m- The first step is to show that the first 
stage of a selective sweep can be approximated by a branching process. Recall that when the 
number of individuals with the favorable B allele is k <^ 2N, the rate of transitions that increase 
the number of B individuals from k to k+1 is k(2N — k)/2N « k, while the rate of transitions that 
decrease the number of i? individuals from k to k — 1 is k(2N — k)(l — s)/2N ~ k{l — s). Therefore, 
the individuals with the B allele follow approximately a continuous-time branching process in 
which each individual gives birth at rate one and dies at rate 1 — s. Also, each new individual 
born with the B allele inherits the allele at the neutral site from its parent with probability 1 — r. 
We can model this recombination by considering a multi-type branching process starting from 
one individual in which each new individual is the same type as its parent with probability 1 — r 
and is a new type, different from any other member of the current population, with probability 
r. 

Say that an individual in the branching process at time t has an infinite line of descent if it 
has a descendant in the population at time u for all u > t. Otherwise, say the individual has a 
finite line of descent. It is well-known that the process consisting only of the individuals with 
an infinite line of descent is also a branching process. This is discussed, for example, in Athreya 
and Ney (1972). For more recent work in this direction, see O'Connell (1993) and Gadag and 
Rajarshi (1987, 1992). In Section 6, we will show that when the original branching process is 
a continuous-time branching process with births at rate 1 and deaths at rate 1 — s, the process 
consisting only of the individuals with an infinite line of descent is a continuous-time branching 
process with no deaths in which each individual gives birth at rate s. That is, this process is a 
Yule process with births at rate s. The probability that a randomly chosen individual has an 
infinite line of descent is s, so when the original branching process has J individuals, there are 
approximately Js individuals with an infinite line of descent. Furthermore, since the past and 
future are independent by the Markov property, the genealogy of a sample will not be affected if 
we sample only from the individuals with infinite lines of descent. 

In section 6, we justify these approximations. This will lead to a proof of the following 
proposition, which explains how the genealogy of the first phase of a selective sweep can be 
approximated by the genealogy of a continuous-time branching process. 

Proposition 2.7. Consider a continuous-time multi-type branching process started with one 
individual at time zero such that each individual gives birth at rate one and dies at rate 1 — s. 
Assume that each individual bom has the same type as its parent with probability 1 — r and a new 
type with probability r. Condition this process to survive forever. At the first time at which there 
are [JsJ individuals with an infinite line of descent, sample m of the [J.sJ individuals with an 
infinite line of descent. Define to be the marked partition o/ {1, . . . , m} such that i j if 
and only if the ith and jth individuals in the sample have the same type, and the marked block 
consists of the individuals with the same type as the individual at time zero. Then for all tt G Vm, 
we have \P{^m = tt) - P(T„ = 7r)| < C/(logiV)2. 

Recall that in the introduction we constructed a random marked partition 11 with distribution 
Qr,s,L, where L = [2A^sJ . To compare this partition with B, we will consider the construction 
in two stages, just as we considered two stages of the selective sweep. The first stage of the 
construction will involve the integers i such that Zi < [Js\ , and the second stage involves the 
integers i such that > [ JsJ . We think of Z^ = k as meaning that the ith lineage escapes the 
selective sweep at a time when there are k individuals in the Yule process (or, equivalently, k 
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lineages in the branching process with an infinite hne of descent). We use [JsJ as the boundary 
between the two stages because, when the population size of the branching process reaches J, 
there are approximately Js individuals with an infinite line of descent. 

The next result compares the first stage of a selective sweep to the random variables Zi such 
that Zi < [Js\ . 

Proposition 2.8. There is a positive constant C such that for all partitions vr o/{l, . . . ,n}, we 
have \P{Tn = vr) - Q,,,,ljsJ (vr)| < C/{logNf. 

Part 3. G2 « Qr,s,[Js\,qj ~ Qr,s,L 

Proposition 2.6 shows that the number of lineages that escape the sweep during [rj,r] has 
approximately a binomial distribution with success probability qj. This motivates the following: 

Definition 2.9. Let r, s, and q be in (0,1), and let L be a positive integer. Let Qr,s,L,q be the 
distribution of the random marked partition U' of {1, ... ,n} obtained as follows. First, let U be 
a random marked partition of {1, . . . ,n} with distribution Qr,s,L- Let . . . ,^,n be i.i.d. random 
variables such that P{^i = 1) = q and P{^i = 0) = 1 — q. Then say that i ~n' j if md only if 
i ~n j cLnd = = 0. Mark the block ofH' consisting of all integers i in the marked block ofH 
such that = 0. 

The next two propositions establish the connection between the second stage of the construc- 
tion of n and the second stage of the selective sweep. Proposition 12 . lOl shows that it is unlikely to 
have Zi = Zj if both are at least [JsJ , just as Proposition 12 . 51 shows that lineages are unlikely to 
coalesce during the second stage of a selective sweep. Likewise, Proposition 12. Ill shows that the 
number of Zi greater than [JsJ has approximately a binomial distribution, just as Proposition 
12.61 shows that the number of lineages that escape the selective sweep during the second stage 
has approximately a binomial distribution. 

Proposition 2.10. P{Zi = Zj > [JsJ) < C/(logN)^ for all i / j. 

Proposition 2.11. Let D = : Zi > [JsJ}, and define qj as in Proposition 2.2. Then 
P(D = d)-Qg5(l-<Zj) 



< 



C 



{logNf 



for d = 0,l. 



,n. 



Propositions 12.81 12.101 and 12.111 are proved in Section 7. The proofs of Propositions 12.101 and 
12.111 are straightforward, but the proof of Proposition [2?8l is more difficult. It involves considering 
marked partitions vr with different numbers of blocks and doing combinatorial calculations in 
each case. 

Proof of Theorem M.'A By Propositions 12.71 and ITHl we have |P(^n = vr) — Qr,s,[jsJ ('''") I ^ 
C/(logA^)^ for all vr G Vn- It follows from this fact, Proposition 12.61 and the construction of 02 
that |P(e2 = Ti)- Qr,s,[_js\,q.M)\ < C / {log N)^ iov all TT G Also, by defining = l{z,>LJ.J} 
and applying Propositions 12.1U1 and ITTTl we see that \Qr,s,lJs\,qj{^) ~ Qr,s,L(vr)| < C/(logA^) 
for all vr G Vn- This observation, combined with the discussion in Part 1 of this subsection, 
completes the proof of Theorem 11.21 □ 
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3 Recombination of one lineage 



Our goal in this section is to prove Propositions 12 . ll and 12 .21 which pertain to the recombination 
probabihties for a single lineage. The strategy will be to study the process X = {Xt)J^Q, which 
describes how the number of individuals with the B allele evolves during the selective sweep, and 
then calculate recombination probabilities conditional on the process X. In subsection 3.1, we 
show that the process X behaves like an asymmetric random walk, and work out some calculations 
that will be needed later. We prove Proposition 12.11 in subsection 3.2 and Proposition 12.21 in 
subsection 3.3. 



3.1 Random walk calculations 

Suppose 1 < < 2iV-l. ThenXf = + 1 if and only if 5t_i(/t,i) = and Bt-i(/t,2) = 1- 
Also, Xt = Xt-i — 1 if and only if Bt-i{It,i) = 0, Bt-i{It,2) = 1, and It,4 = 0. Otherwise, 
Xt = Xt-i. It follows that, for 1 < A; < 27V - 1, 



P'iXt = Xt-i + l\Xt- 
P'{Xt = Xt-i - l\Xt- 
P'{Xt = Xt^i\Xt-^i = 




Let Sq = 0, and, for m > 1, let Sm = inf{t > Sm-^i : Xt / be the time of the mth jump. 

It follows from (jSHJ) and (jlT^ that the process {Xs^)'^=q is a random walk on {0, 1, . . . , 2A'^} 
that starts at 1, at each step moves to the right with probability 1/(2 — s) and to the left with 
probability (1 — s)/(2 — s), and is absorbed when it first reaches or 2N. A standard calculation 
for random walks (see e.g., section 3.1 of Durrett (2002)) gives the following result. 

Lemma 3.1. Let p{a,b,k) = P'(inf{s > t : Xs = b} < inf{s > t : Xs = a}\Xt = k) be the 
probability that if the number of B's is k, then the number of B's will reach b before a. For 
0<a<k<b<2N, 

P(«^^fc)= llnll-I '^^d P{Xr = 2N)=piO,2N,l) = ^_ 



Given 1 < j < 2N — 1 and 1 < A; < 2N — 1, we define the following quantities: 

up jumps Ukj = #{t > Tj : Xt = k and Xt+i = A; + 1} 

down jumps D/^ j = ^{t > Tj : Xt = k and Xt-^-i = k — 1} 

holds Hkj = #{t > Tj : Xt = k and Xt+i = k} 

total Tfcj- = Ukj + Dkj + Hj^j 

Also, let Uk = Uk,i, Dk = Dk^i, Hk = -f^fc,i, and Tk = Tk^i- The expected values of these 
quantities are given in the following lemma. 
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Lemma 3.2. Suppose l< j <2N -I and l<k<2N-l. Define 



p{k,2N,k + l) _ s (l-(l-s)2^) 

p{0,2N,k + l) ~ (1 - (1 - s)2^-'=) ■ (1 - (1 - 



Also, define qo = 1. Define r^j = 1 /or j < k, and let rpj = 0. If j > k, let 

_ p{k,2N,j) _ (i-(i-.y-fe) (1 - (1 - ,)2A^) ^ 

^'^ p(0,2iV,j) (1 - (1 - s)2^-'=) (l-(l-s)i) 



T/ien = Vkj/qk- Also, E[Dkj] = (iM-i) - I for k > j and E[Dkj] = rfc-ijM-i for 

k < j. Furthermore, 

.lH,,l..[^..Z>,,l(^)J-, """«^-''-^" , ,3.4) 

w/iere = A;(27V - k)/{k^ + (27V - /i:)^ + sk{2N - k)). 

Proof. First, suppose k > j. On the event {Xr = 2A^}, we have Xt = k and XtJ^i = k + \ 
for some t > tj. Note that P'{Xs > k for all s > t\Xt = k + 1) = p{k,2N,k + 1) for all t, 
so P{Xs > k for ah s > t\Xt = k + 1) = p{k,2N,k + l)/p{0,2N,k + 1) = qj,. It follows that 
the distribution of Ukj is Geometric(gfc), so E[Ukj] = 1/qk- If instead k < j, then P{Xt > 
k for all t > Tj) = p{k,2N,j)/p{0,2N,j). Therefore, P{Tk,j > 1) = r^j. It follows from the 
strong Markov property that, conditional on T^j > 1, the distribution of Ukj is Geometric(gfc), 
so E[Uk,j] = rkj/qk- 

To find E[Di.j], note that if A; > j then X takes a downward step from k to k — 1 after each 
step from k — 1 to k except the last one, so D^j- = U^-ij — 1- If A; < j, then the number of steps 
after tj from A; to /c — 1 is the same as the number of steps from A; — 1 to A;, so Di^j = Uk-i.j- 
The formulas for E[D]^^j\ follow immediately from these observations. 

Let pk = P{Xt ^ = A:). To prove note that (|3IS1) gives 

A:(2iV - A;)(2 - s) 
Pk = - 



(2iV)2 

It follows that the conditional distribution of T^j given Ukj and Di^j is the same as the distribu- 
tion of the sum of Ukj + Dkj independent random variables with a Geometric(pfe) distribution. 
Therefore, 

E[Hk,j] = E[TkJ - E[Uk,j] - E[Dk,j\ = E[Ukj + Dkj] " l) • 

Straightforward algebraic manipulations give 1/pk — 1 = l/[/5/fc(2 — s)], which implies the equality 
in 1)3. 4|) . To check the inequality in 1)3. 4() . note that ii k > j then 



E[Uk,j + Dk,j] = — + - 1 < 1 + 1 - 1 
Qk Qk-i s s 



and if A; < J then 
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We will now calculate the probability that the ancestor at time t has the opposite B or b 
allele from the ancestor at time t — 1, given that Xt-i = k and Xt = I, where 1 < A; < 2N — 1, 
1 < I < 2N, and |A; — / | < 1. All of these recombination probabilities are the same under P' and 
P because of the conditioning on Xt-i and Xf. We define 

= P{Bt^i{Al-\i)) = 0\Xt-i = k,Xt = l,Bt{i) = 1), 
pl{k,l) = P{Bt^^{A\-^{i)) = = k,Xt = l,Bt{i) = 0). 

Lemma 3.3. We have 



pUk,k-l) 
PB{k,k + l) 



plik,k + l) = 
r{2N - k) 



{k + l){2Ny 
PB{k,k) =pl{k,k) 



pl{k,k-l) 



rk 



{2N -k + l){2Ny 
rk{2N -k) _ rfiu 

2N[k'^ + {2N - kY + sk{2N - k)] ~ 2N' 



Proof. We will prove three of the six results; the others are similar. If = k and Xt = 

A; + 1, then the new individual born at time t has a B allele. Therefore, if Bt{i) = then 
Bt-i{A\'^{i)) = 0, so pl{k,k + 1) = 0. Suppose instead Bt{i) = 1. Then, = if 

and only if It,i = i (meaning that the ith individual is the new one born), It,5 = 1 (meaning that 
there is recombination), and -Bt-i(-^t,3) = (meaning that the new individual gets its allele at 
the neutral site from the member of the b population). Conditional on Xt-i = k, Xt = k + 1, 
and Bt{i) = 1, the probabilities of i = i, It^^ = 1, and Bt-i{It;i) = are 1/(A; + 1), r, 
and (2A^ — k)/2N respectively. Multiplying them gives the expression for p'^^{k,k + 1). To 
calculate p^^{k,k) we use the fact that, conditional on Xt-i = k and Xt = k, the probability 
that Bt-i{It,i) = Bt-i{It,2) = 1 is k'^/[k'^ + {2N - kf + sk{2N - k)]. Multiplying by l/k, r, and 
{2N - k)/2N gives p'siKk). □ 

3.2 Bounding the probability of two recombinations 

We now begin working towards a proof of Proposition 12.11 which shows that it is unlikely that 
a lineage will go from the B population to the b population and then back to the B population 
because of two recombination events. We begin by proving two simple lemmas. Lemma 13.41 
bounds the probability that the number of individuals with the B allele is k at the recombination 
time R{i). Lemma ll-{.5l is a useful deterministic result, which can be proved easily by splitting the 
sum into terms with j < N/2 and j > N/2. 

Lemma 3.4. We have P(X^(j) = k) < r/ks. 



Proof. Considering the cases = k + 1 and = k and using Lemmas K12I and iH.^-iL 

P(X^(,) = A;) < pl{k, k + l)E[U,,] +pUk, k)E[Hk] 

r{2N-k) r r{2N-k)+rk_ r 

- {k + l){2Ns) '^2N~s - 2iVA:s ~ Vs' 
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Lemma 3.5. If a > 1, there is a C depending on a but not on N so that J2f=i «Vj < Ca^/N. 

Proof of Provosition I^TH Denote the time of the second recombination event by R2{i) = sup{t < 
R{i) : Bt{A\{i)) = 1}, where sup0 = -oo. Our goal is to show P{R2{i) > 0) < C/{logNf. 
Note that by symmetry, the conditional distribution of (^t)I=o^ given X^- = 2N is the same as 
the conditional distribution of (2A^ — XT--t)t=i given Xr = 2N. It follows from this fact and the 
strong Markov property that 

E[#{t < R{i) :Xt = k and Xt+i = k + = j]] = E[U2N-k^i,2N^]l 

E[#{t < R{i) ■.Xt = k and Xt+i = k- l\Xnii) = j}] = E[D2N-k+i,2N~j], 
E[#{t < R{i) ■.Xt = k and Xt+i = = j}] = E[H2N-k,2N-j]- 

Therefore, by Lemmas 13.21 and 13.31 

P(.XR2ii) = f^\^R{i) = j) < Pb(.k,k- l)E[D2N-k+l,2N-j] + pl{k,k)E[H2N-k,2N-j] 

Using Lemma 13.41 

j=l •> \ k=j k=\ ^ 

Since r"^ j < C/(logiV)^, it suffices to show that the sum on the right-hand side of 1)3. 5p is 
bounded as ^ oo. We will handle the two terms separately. For the first term, we change 
variables £ = 2N — k and use Lemma 13.51 to get the bound 

If'^' jl- s)'-^ \ _ (1 - s)'^-^ ( ^fi' / 1 Yl 



A. j(2N-j) - N friJ ~ N 
The second term in the sum on the right-hand side of ()3.5|) can be bounded by 

AT , / . N 2Af-l , / 2N-1 , X , 2N~1 , 2Af-l 



EtUf VE ^ E 7 =i4E7 E (1 



N ^ N\ ^ £ J N 

j=N+l ^e=2N-j ^ i=\ j=2N- 



< 3. □ 
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3.3 Estimating the recombination probability 

Our next goal is to prove Proposition 12.21 which gives an approximation for P{R(i) > tj). The 
idea behind the proof is that every time there is a change in the population, there is some 
probability that a lineage will escape the selective sweep at that time, given that it has not 
previously escaped. Since the individual probabilities are small, if they sum to S, we will be able 
to approximate the probability that the lineage never escapes by . It will be easier to work 
with conditional escape probabilities given X, so to justify the approximation it will necessary 
be to show that the sum of the conditional probabilities has low variance. 

For 1 < t < T, let = Now, 9t is the conditional probability, given X, that a 

lineage escapes at time t if it has not previously escaped, so we have 

T T 

P{R{i)>Tj\X) = l- n [l-fB{Xt^i,Xt)] = l- n (1-^*)- (3.7) 

t=Tj + l t=Tj + l 

To estimate the probability that a lineage escapes after time rj, we will consider the sum of these 
conditional probabilities, which we denote by rjj = X]r=Tj+i ^t- '^^^ next lemma shows that to 
estimate P{R{i) > rj) to within an error of 0((log A^)~^), it suffices to calculate E[e~^-^]. 

Lemma 3.6. For all J, we have \P{R{i) > tj) - (1 - -E[e"''J])| < C/{\ogNf. 

Proof. It follows from the Poisson approximation on p. 140 of Durrett (1996) that 



\P{R{i) > Tj\X) 
By taking expectations, we get 



(l-e-^-OI< E i 

t=Tj + l 



P{R{i)>Tj)-{l-E[e-^']) 


< E 








L t=Tj+l -1 



It now remains to bound ELi ^t] ■ By Lemma lO 

r 2N~1 



t=i 



Therefore, by Lemma 13.21 

■ T 

E« 



E 



t=i 



k=l 



r^{2N-kf , ^ r^Pl 



(fc + l)2(2iV)2 



{2Nf 



2N 

sE 



r2(2iV - ky 



+ 



< 



k=l 
9 2N 

tE 

k=l 



s(fc + l)2(2iV)2 {2N)-^s 

1 1 



(k + l) 



+ 



(2iV)S 



< Cr^ < 



C 



(logiV)2 



which completes the proof. 

The next result will allow is to work with a truncated version of the sum. 



(3. 



□ 
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Lemma 3.7. If r]'j = EU,+i Otl{x,-,>j}, then E[r,j - ij] < C/J{\ogN). 

Proof. Using Lemmas 13.21 13.31 and 13.51 we get 

J-i 

E[riJ -ri'j] = Y. {p'sik, k + l)E[Uk,j] + p'sik, k)E[Hk,j]) 

k=l 

r{2N-k) {1-sy-'' rpk (1 - s)' 



< 



^-(fc + l)(27V) s 2N sh 

J-l / \ J-l . X k-J ^ 

^ ■^-^ ^ 7_i. I r \ r 1 / 1 \ , Or 

k=l '^■""^ " fe=l 



We will work with r]'j rather than r]j because we can obtain a rather precise estimate of its 
expected value, which is given in the next lemma. We will also be able to obtain a bound on its 
variance, which will enable us to approximate E[e~'^j] by e~^^^jh 

Lemma 3.8. E^j] = § E'=j+i l+oU + 



JlogN ^ 

Proof. It follows from Lemma 13.21 and a straightforward calculation that 



Qk Qk-i J\l3k{2-s)j s(3k\ 



\2N 



Therefore 

2iV-l 



k=J 

( r{2N - fc)(l - (1 - sf+^){l - (1 - sf^-'') r(l-{l- sf){l-{l- sf^-^) 



A:=J 



(/fc + l)(2iVs)(l- (l-s)2^) (2iVs)(l - (1 -s)2^) 

7- ^ / i-(l-s)2^-^ \ / (2iV-fc)(l-(l-s)^+^) l-(l-s^ 
s ^ V l-(l-s)2^ JV (2iV)(fc + l) ^ 2i^ 



Now 



^^"^ ^ 1 - (1 - sf^-^\ ( {2N - k){l - (1 - s)^+i) 1 - (1 - 



(l-s)2^ yV (2A^)(A; + l) 2Af 

2iV-l . /9\ , . 

fc=J ^ ^ fc=l ^ ^ ifc=Af+l ^ ^ 



<2(l + logiV)(i_,)A^ + J_<^. 



Therefore, 

27V- 1 



{2N - k){l - {I - sf+^) l-{l-sf\ ^( I 



^[^^] = I A. (2Ar)(A: + l) + 2Ar ) ^ ^ \N 
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Also, note that 



^ 2N ~ ^ 2N - 2N ' 

k=J k=J 



Therefore, since r < Co/logA^ 

2N~l 



^ k=J ^ ^ fc=J+l ^ ^ 

Since 

r ^ (1 - sf r V n - = ^(^ ~ ^)'^^^ 

s ^ - s{J + 1) ^ ^ ' s2(j+i) ' 

fc=J+l ^ ' k=J+l ^ ' 

the desired result follows from H3.1U() . □ 

The key remaining step is to bound Var (r/j). The necessary bound is given in Lemma I3.1UI 
The proof uses Lemma 13.91 which can easily be proved by conditioning on M and N . 

Lemma 3.9. Suppose (Xj)^]^ and (1^)^]^ are independent i.i.d. sequences such that E[Xi] = fi 
and E[Yi] = 7. Suppose M and N are integer-valued random variables that are independent of 
these sequences. Then Cov(Xi + • • • + Xm, li + • • • + Yjy) = fi^Cov{M, N). 

Lemma 3.10. There exists a constant C such that Var(r/j) < C/J(log A^)^. 

Proof. Let 

2iV-A; 1 , , 

"'^ = (fc + l)(2A) - V 

k{2N-k) k{2N-k) 
^ ~ 2N[k'^ + {2N - kf + sk{2N - k)] - 2iV3 ' ^^'"^^^ 

Then rj'j = Y7t=Tj+i (^d{Xt-i>J} = ^ Ylk=j^(.^kUk + bkHk). For any random variables X and Y, 

Var(X + Y)= Var(X) + Var(y) + 2Cov(X, Y) 

< Var(X) + Var(y) + 2 VVar(A)Var(y) < 4max{Var(X), Var(y)}. 

Therefore, 

Var(??S) < 4r2max|Varf afc[/fcj,Varf ^ bkHkj k (3.13) 

k^^J k — J 

We will bound Var(^^^J^ CLkUk) and Var(^^^J^ bkHk) by C/J, which will prove the lemma. 
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To bound Var(^^^y"'^ Ofct/fc), we will need to bound Cov{Uk,Ui). To do this, we will break 
up Ui into jumps from / to / + 1 that occur before the last visit to k and those that occur after 
the last visit to k. More formally, let Cfc = sup{t : Xt = k}. If k < I, then Ui = Uli + tJk,i-, where 

Ul^i = #{t>Ck--Xt = l and Xt+i =/ + !}, 
Uk,i = #{t<Ck-Xt = l and Xt+i =1 + 1}. 

The processes {Xt)o<t<Ck ^'^d (^OCfc<*<T" independent. Therefore, Uk and C/^ ^ are inde- 
pendent, and Uk^i and C/^ ^ are independent. As observed in the proof of Lemma 13.21 Ui has a 
Geometric(g/) distribution. Likewise, note that P'{Xs > I for all s > t\Xt = l + l) = p{l,2N,l + l) 
and P'{Xs > k for all s > t\Xt = / + !)= p{k, 2N, I + 1). Therefore, 

P{Xs > I for ah s>t\Xt = l + l,Xs> k for all s > t) - ^ ^ 



p{k,2N,l + l)' 



It follows that if we let v^^i = p{l,2N,l + l)/p{k,2N,l + 1), then C/^ ^ has a Geometric(vfc^i) 
distribution. Using Lemmas 3.1 and 3.2 and the fact that qi = p{l, 2N, I + l)/p(0, 2N, I + 1), we 
have 

1 1 1 - (1 - / 1 - (1 - S)'+1 1 - (1 - s)'+l-fe ^ 



qi Vk,l s \l-(l-sy^ 1 _ (1 _ 5)2JV-fc 

< -(1- (1- (1-5)'+^^'^)) = — . (3.14) 

s s 

Also, Var(?7;) = Var(C/^^) + Yaic{Uk,i) because Uk,i and Uj^i are independent. Therefore, if J < 
k < I < 2N, then by the formula for the variance of a geometric distribution, 

Var(f7fe,0 = Var(C/0 - Var(C/^ J = - 

i + (3.15) 

qi J \qi Vk^iJ s s 

where the inequality uses 1)3. 14() and the facts that qi > s and Vk,i > s. Also, 

YariUi) = ^—^ < \. (3.16) 
qf 

Since Uk and C/^ ^ are independent, it follows from 1)3. 15(1 and 1)3. 16(1 that if /c < /, then 
Cov(;7,., Ui) = Cov(C/fc, Ul^i + Uk,i) = Cov(t/fc, Uk,i) 

< ^Yav{Uk)Y3.T{Uk,i) < ^(1 - 5)('-'=)/2_ (3_-L7) 
Using H3.17() and (|3.11|) . we calculate 

■ 2N~1 X 2N-12N-1 „ /t:: 2Ar-l 2Af-l 

s)a-fe)/2 



^(^kUk]= E afc«/Cov([/fc,[/0 < ^ 5^ 5^ 

k=J ^ k=J l=J ^ k=J l=k 

/K2N-1 ,2N-1 . 2N-1 „ 
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It remains to bound Var(^^^j^ bkH^). Recall from the proof of Lemma l3 . 21 that 



Pk = PiXt ^ Xt-i\Xt-i = k) 



k{2N - k){2 - s) 
(2iVj2 



and that + Uk = Uk~i — 1 + Uk, using the convention that Uq = 1. Therefore, we can write 
Hk = Gi + G2 + ■ ■ ■ + Gu,.+Uu-i-i-> where {Gi)°^i is an i.i.d. sequence of random variables such 
that Gi + 1 has a Geometric(pfc) distribution for all i. Thus, = — 1. If A; < then by 

Lemma I3.9| 

CoY{Hk. Hi) = (^1 - 1^ (^1 - 1^ CoY{Uk + Uk-i - 1, C// + C//-1 - 1) 

< J_CoY{Uk + Uk-1, Ui + Ui-i) 
PkPi 



Pkpl 



Note that (|XT^ imphes 



h ^ k{2N - k) 



(2iV)2 



< 



Pk 



2iV3 k(^2N - k){2 - s) (2 - s)N - N 



Therefore, 



2N-1 



2N-1 2N-1 



2N-12N-1 



Var( ^bkHk)= ^ bkbiCov{Hk, Hi) < C J] ^{1 - s 



M-k)/2 



k=J 



k=J l=J 

2N-1 2N-1 



< 



c_ 

iV2 



E Ed 



< 



C 



k=J l=k 
2N-1 



PkPl 



k=J l=k 

The lemma follows from dXTHl), and (ITT!?1) . 

Proof of Proposition \2JA Lemma 13.61 gives 

P{R{t) > Tj) - (1 - ^[e-"'^]) < E 
Since < 1 for x > 0, Lemma 13.71 gives 



< 



G 



t=TJ + l 



< 



G 



{log Nf 



E[e-'''j - e^'^-'] <E[r]j- r]'j] < 



G 



J(logiV)- 



Using Jensen's inequality and Lemma I3.1UI 



\E[e-'''j] - e-^[''^]| < Ele-^'J - e-^['''/l| < E\7]'j - E[r]'j]\ < Yai{r]'jf/^ < 



G 



(3.19) 
□ 



VJ{\ogN)' 



Furthermore, it follows from Lemma 13.81 that 



N J\ogN 

Combining the last four equations gives the proposition. 



□ 
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4 Coalescence of two lineages 



In this section, we prove Propositions 12.31 12.41 and 12.51 all of which pertain to the probabilities 
that two lineages in the sample coalesce. We begin by computing the following coalescence 
probabilities for integers k and / such that 1 < /c < 2A^ — 1, 1 <l < 2N, and |A; — / | < 1: 







= At 


-\mt-i 


= k,Xt = l,Bt{i) 


= l,Bt{j) 


= 1) 






= At 


-\3)\Xt-i 


= k,Xt = l,Bt{i) 


= 0,Bt{j) 


= 0) 


P%b{k,l) 


= PiA'f\i) 


= A\~ 




= k,Xt = l,Bt{i) 


= l,Bt{j) 


= 0) 



As with the recombination probabilities in the previous section, the Markov property implies 
that the coalescence probabilities are the same under P' as under P. 

Lemma 4.1. We have 



p%B{k,k-l)=pli,{k,k + l) = 0, 

, , , , 2 / r(2N-k) 

p<^^^{k,k + l) = -——-{l ^ ' 



k{k + l)\ 2N 
pt,{k, k - 1) =(^2N-k)i2N-k + l) " ll) ' 



Pbb{k,k) = ——- TT ' PBBik,k) = —— 1 



k{2N-k)\ 2Nj' ^-"^^ ' ' k{2N-k)\ 2N 
PBbik,k) = , .Jj^'' , p%f^{k,k + l) = p%f^{k,k-l) 



k{2N-ky "^^"^ ' ' 2N{k + iy "^^"^ ' ' 2N{2N-k + iy 



Proof. This result follows from a series of straightforward calculations, similar to those used to 
prove Lemma 13.31 We explain the idea behind some of these calculations. When Xt-i = k 
and Xt = fc — 1, the new individual born at time t has the b allele. Therefore, two B lineages 
can not coalesce at this time, so p'^B{k,k — 1) = 0. By the same reasoning, p'^^{k^k + 1) = 0. 
When Xt-i = k and Xt = k + 1, the new individual born at time t has the B allele. With 
probability r{2N — k)/2N, this individual inherits its allele at the neutral site from a member of 
the b population because of recombination. If this does not happen, then two of the B individuals 
get their allele at the neutral site from the same parent. Thus, conditional on Bt{i) = Bt{j) = 1, 
the probability that the ith and jth individuals get their allele at the neutral site from the same 
parent is 2/[k(k + l)], which implies the formula for p'^B{k, k + 1). The calculation plf^{k, k — 1) 
is similar. 

Now suppose Xf-i = Xt = k. Conditional on this event, a B replaces a B with probability 
k'^ /[k'^ + (2A^ — ky + sk{2N — k)]. If the new B gets its allele at the neutral site from a member 
of the B population, which has probability 1 — r{2N — k) /2N, and if Bt{i) = Bt{j) = 1, then the 
probability that the ith and jth lineages coalesce is 2/k'^, as there are k possibilities both for the 
individual who dies and the parent of the new individual. The formula for k) follows, and 

Pbb^k, k) can be calculated in the same way. Next, to find p%ij{k, k), note that if a S replaces a 
B, and Bt{i) = 1 and Bt{j) = 0, then the probability of coalescence is r/{2kN), as there must 
be recombination, and there are k choices for the B individual that is just born and 2N choices 
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for the parent from which it gets its allele at the neutral site. If instead a b replaces a b, which 
happens with probability {2N - kf' /[k"^ + (2iV - k)"^ + sk{2N - k)] conditional on Xt-i = Xt = k, 
the probability of coalescence is r/[{2N — k){2N)]. Adding the probabilities for the two cases 
gives the formula for p'^f^{k, k). 

Finally, to calculate p'^^^{k,k + 1) and p'^f^{k,k — 1), note that when a B replaces a b, the 
probability that a B lineage coalesces with a b lineage is r/[(k + 1)(2A^)], as there must be 
recombination, and there are k + 1 choices for the B individual that was just born and 2N 
choices for its parent. Likewise, the coalescence probability is r/\{2N — k + 1)(2A^)] when a b 
replaces a. B. □ 

Proof of Proposition \2.'A We consider first the case in which the jth lineage is descended from a 
member of the B population at the time of coalescence. Summing over the possible values k for 
^G{i,j) ^-iid applying Lemmas 13.21 and l4.ll we get 



P{G{i,j) > 0,i?(,(,,,)+i(A^(^'^)+^(i)) = 0, and = 1) 

2N-1 

< J2 {p'Bbik,k + l)E[Uk]+p'B,{k,k-l)E[Dk]+p%,{k,k)E[Hk]) 



k=l 
2N-1 



- ^ i 2N{k + l)s ^ 2N{2N -k + l)s ^ sk{2N - k) 
(- 



k=l 

2N-1 



< TTT^ > . ^ + ^. r + 



2Ns ^^\k 2N-k k{2N - k) 

_2r^^^ 1 4r A 1 4r(l + logiV) C 

~ T ^ k{2N - k) ~^s^_^k- Ifs - iV' 

It remains to consider the case in which the ith and j'th lineages are both descended from 
a member of the b population at the coalescence time. By summing over the possible values of 
and XG'(jj), we see that it suffices to show 

2N-1 2N-1 



E P(^m = ^)P{^Gir,j) = k,BGii,)+i{A?^'''^+\i)) = 0, and 

f'^l k=l ^ 

BG(«)+i(.4?"-''+'(i))=0 



If SG(ij)+i(^r ^^'^-"^^(i)) = 0, then G{i,j) < R{i). Therefore, it follows from Lemmas EISl and E] 
and the time-reversal argument in the proof of Proposition 12.11 that 

= k and BGi.,j)+iiA^'-'''^+\i)) = BGii,j)+iiA^^'''^+Hj)) = 0|X^(,) = i) 

< plb{k, k - l)E[D2N-k+i,2N-£] + Pbbik, k)E[H-2N-k,2N-e\ 

< pt + 2(2yV -5\ „i„{,i _ 1, _ 4iV mi„{(l - sf-', 1) 



sk{2N-kY ) / ' J sk{2N-ky 
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Combining this result with Lemma 13.41 we get that the left-hand side of (|4.1() is at most 

^ r / ^ 4iVmm{(l - s)" ,1} 
2^ 771 2^ 



<4r- V V V iV(l-.)^ ^ 



Using and the fact that N/[k{2N - k)] < 1 ioi 1 < k < 2N - 1, we get 



4r'^'l/'^' N{l-s)>'-' \ 4r/ 2C(l + logjV) \ C 



For the second term in (|4.2|) . we have 

^ 2^ 7 ( 2^ U2N -kfj kN^ ) ^ ~N\^k{2N-kf 



N 



< 



l=N+l 

l^y^l) ^ ^ k(2N - kV 

^ 1=1 ^ k=\ l=k ^ ' 

^ 4r(l + logjV)2 4r ^ 1 C{\ogN) 

Ns^ s2 ^ fc(2iV -k) - N ■ ^ ■ ^ 

Using and in (|i?2|) proves (HU. □ 

The next lemma, which bounds the probability that there are k individuals with the B allele 
at the time the ith and jth lineages coalesce, will be used in the proofs of Propositions 12.41 and 

Lemma 4.2. We have 

= k and i?G(.,)+i(^?^"^'^^'(^)) = Bai^,)^AA?^'^^^^\j)) = 1) < (4.5) 
Proof. By Lemmas 13.21 and 14. 11 the probability on the left-hand side of (|4.5)) is at most 
p%^{k,k + l)E[Uk]+p'BB{k,k)E[Hk]<--^—+ ^ 



sk{k + l) sk{2N-k) 
2{2N -k) + 2k _ m 
- sk^{2N-k) ~sk'^{2N-k)' ° 

Proof of Proposition \2.4\ By Proposition 12.31 it suffices to show that 

P(0 < Rii) < G{z,j) and = i?G(.,,)+i(^?(^'^'^+Hj)) = 1) < 
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By Lemmas 13.21 and 13.31 and the time-reversal argument in the proof of Proposition 12.11 

= £ and < Rii) < G{i, j)\XGii,j) = k) 
<pWJ+'^)E[U2N-e-l,2N-k] +pW,^)E[H2N-e,2N-k] 

Combining this result with 1)4. 5(1 . we get 

P(0 < Rii) < Gii,j) and = = 1) 

< y y ilmin{(l-s/-^l}) 



< 



2 

fc=l - \ - V - £=1 



The first term in the sum on the right-hand side of (|4.6p is at most 

2N~1 ,j. ,2N-1 N /i\ / ^ 1 2Af-l ^ 



which is bounded by a constant. The other term in the sum in 1)4. 6p is at most 

N{l + \ogk) ^ A 1 + logfc 1 + log(2jV) 

^ A;2(2iV - fc) - ^ B ^ ^ N(2N - k) ' 

k=l ^ ' k=l k=N+l ^ ' 

which is also bounded by a constant. Since Ar/s^ < C/{logN), the proposition follows. □ 
Proof of Proposition \2. 51 By reasoning similar to that used to prove Lemma 14.21 we have 

P(G(i, j) > Tj and BGi^,,)+Mr^'''^^'ii)) = i?G(^,,)+i(^?(^'^')+'«) = 1) 

27V-1 

< E {p'BB(.k,k + l)ElUk,j]+p'hB(.k,k)E[Hk,j]). (4.7) 



k=l 



However this time we keep the factor min{(l — s)"^ ^, 1} from Lemma [3.21 to bound the right-hand 
side of (|X7|) by 



' 4N 4N 



^^'^ \k^{2N -k)^ sk^{2N-k)' ^^'^^ 

Using the fact that N/[k{2N - k)] < 1 iov 1 < k < 2N - 1 and then Lemma ESI we have 
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For the second term in (|4.8I) . we observe 

V < V — + V < 4(1+ log TV) 

^ sk'^(2N -k) - ^ sk^ ^ sN(2N - k) - sJ Ns 

k=J+l ^ ' fc=J+l k=N ^ ' 

Since J < C"A^/(log A^), the bounds in the last two equations add up to C/J, and the desired 
result follows from these bounds and Proposition 12.31 □ 



5 Approximate independence of n lineages 

In this section, we prove Proposition 12.61 We first establish a lemma that involves the coupling 
of two {0, 1, . . . , n}-valued random variables. 

Lemma 5.1. Let V and V be {0,1, ... ,n}-valued random variables such that E[V] = E[V']. 
Then, there exist random variables V and V' on some probability space such that V and V have 
the same distribution, V' and V have the same distribution, and 

P{V ^ V') < nmax{P{V > 2),P{V' > 2)}. 

Proof. It is clear that V and V' can be constructed such that they have the same distributions as V 
and V respectively and P{V = V') > min{P(y = 0),P{V' = 0)} + mm{P{V = 1), P{V' = 1)}. 
Note that P{V = 0) > 1 - E[V]. Since E[V] = E[V'], it follows that mm{P{V = 0),P{V' = 
0)} > 1 - E[V]. Also, P{V = 1) = E[V] - i2k=2 kP{V = k),so P{V = 1) > E[V] - nP{V > 2). 
Likewise, P{V' = 1) > E[V] - nP{V' > 2). It follows that 

P{V = V') > l-nmax{P(l/ > 2),P{V' > 2)}. □ 

Recah that Kt = #{i G {1, . . . , n} : R{i) > t} for < t < r. Define 9t = X*) as in 

section 3, and define rjj = Yl't=Tj+i ^^'^ v'j = J2t=Tj+i ^t^{Xt-i>J} Lemma l37fl Finally, 

let Fj = P{R{i) > Tj\X), which'is shown in ^l^i to be equal to 1 - ni=rj+i(l " ^t)- 

Lemma 5.2. If J < C'N/ {log N) then for all d G {0, 1, . . . , n}, 

P{Kr,=d)-('])E[Fj{l-Fjr-'^] 



-"^^"fl^'7} + (b^' 



Proof Note that Kr = 0. Also, Kt-i - i^t G {0, 1, . . . , n} for all 1 < t < r, and 

E[Kt-i - Kt\X, = {n- Kt)et. 

Define another process {K^)J^q such that K!^ = and the conditional distribution of Kf_i — K[ 
given X and {K'^)l=t is binomial(n - K[,et). Note that E[K[_^ - K[\X, {K'^)l=t] = {n - K[)et. 
We will show that the processes {Kt)l^Q and {K[)1^q can be coupled so that 

F(K, ^ K for some t>r.,)< min { §} + (5.1) 
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Equation ()5.1|1 implies the lemma because the conditional distribution of K'^^ given X is binomial 
with parameters n and 1 — ni=Tj+i(^ ~ ^t) ~ ^J- 

By applying Lemma 15.11 with V = Kf_i — and V = K[_^ — K^, we can construct the 
process {K^)J^q on the same probability space as {Kt)J^Q such that 

T 

P{Kt ^ Kl for some t > Tj\X) < n P{Kt-i - Kt > 2\X, {K^)^^^) 

t=Tj + l 
T 

+ n ^ P{KU-Ki>2\X,{K)l^^). (5.2) 

t=T, + l 

If Kt-i — Kf > 2 for some t > tj, then tj < R{i) < G{i,j) for some i and j. We have P{tj < 
R{i) < G{i,j)) < C/{logN) for all J by Proposition [Tl and P{tj < R{i) < G{i,j)) < C/J for 
all J < C'N/{logN) by Proposition 2.5. Therefore, for J < C"iV/(log A^), 

r -1 T 

E Yl P{Kt-i-Kt>2\X,{Ku)l=t) < J2 P{Kt-i -Kt> 2 and t>Tj) 

^t=Tj + l -I i=l 

Tl 

< -P{Kt-i -Kt>2 for some t > tj) 
( C C 1 

<min<^- -,-y (5.3) 

Now a binomial random variable will be at least 2 if and only if there is some pair of successful 
trials, so P{Ki_, - K[ > 2\X, < {'^)el and 

j2 P{K't-i - K > 2\x, {KYu^,) < E (5.4) 

t=Tj + l ^ ^ t=Tj + l 

By taking expectations in (|5.2() and applying (|5.3() . ()5.4j) . and ()3.9() . we get ()5.1|) . which completes 
the proof. □ 



Proof of Proposition \2.fA In view of Lemma 15.21 it suffices to show that 



\E[Fj{l - FjT-'^] - qj{l - gj)"-^| < min ^| 



+ (log^iV)2 (^-^^ 



for all d G {0, 1, . . . , n}. If < ai, . . . , a„ < 1 and < 61, . . . , 6„ < 1, then |ai . . . a„ — 61 . . . 6„| < 
Z]r=i |oi ~ as shown in Lemma 4.3 of chapter 2 of Durrett (1996). Therefore, 

\E[Fj{l-Fjr-'^]-qj{l-qjT~'^\<E[d\Fj-qj\ + {ri-d)\{l-Fj)-{l-qj)\]=nE[\F^ 

Note that 

\Fj - qj\ < \Fj - (1 - e""^)! + je""'-' - e"^^! + je""'-' - e-^[^^l| + |(1 - e"^!"'/!) - q,j\. (5.6) 

It follows from (gSl) and (j221) that E[\Fj - (1 - e"^-')]] < C/(logAf)2. The expectations of 
the second, third, and fourth terms on the right-hand side of (|5.6|) can be bounded as in the 
conclusion of the proof of Proposition 12.21 at the end of Section 3. All of those error estimates 
are smaller than the right-hand side of (|5.5j) so the desired result follows. □ 
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6 A branching process approximation 



In this section, we will show how the evolution of the individuals with the B allele during the first 
stage of the selective sweep can be approximated by a supercritical branching process. This will 
lead to a proof of Proposition 12.71 Recall that the first stage of the sweep consists of the times 

< t < Tj, where J = [(log A^)"J for some fixed constant a > 4. We will assume throughout this 
section that N is large enough that J < N. In subsection 6.1, we explain the coupling between 
the branching process and the population model. In subsection 6.2, we consider the lineages in 
the branching process with an infinite line of descent. Proposition 12 . 71 is proved using these ideas 
in subsection 6.3. 

6.1 Coupling the population model with a branching process 

We begin by constructing a multi-type branching process with the properties mentioned in Propo- 
sition 12.71 That is, the process will start with one individual at time zero, and each individual 
will give birth at rate one and die at rate 1 — s. Each new individual has the same type as its 
parent with probability 1 — r and a new type, different from all other types, with probability r. 
We now explain how to construct this branching process so that until the number of individuals 
reaches J, the branching process will be coupled with the population process {Mt)^Q with high 
probability. 

Define random variables = < < • • • such that — is an i.i.d. sequence 

of random variables, each having an exponential distribution with mean 1/2N. The branching 
process will start with one individual at time zero. Until the population size reaches J, there will 
be no births during the intervals {(,t-i,(,t), but births and deaths can occur at the times ^i, ■^2) • • • • 
This branching process will be coupled with (M()^q so that, with high probability, the number 
of individuals with the B allele at time t will be the same as the number of individuals in the 
branching process at time To facilitate this coupling, we will also assign to each individual 
in the branching process a label such that all the individuals alive at a given time have distinct 
labels. We denote by Lt the set of all i such that there is an individual labeled i in the population 
at time When Lt = {i : Bt{i) = 1}, meaning that the labels are the same as the individuals 
in the population model with the B allele at time t, we say the coupling holds at time t. The 
label of the individual at time zero will be U , where U is the random variable with a uniform 
distribution on {1, . . . , 2A^} defined at the beginning of section 2. We have Bq{U) = 1, so the 
coupling holds at time zero. 

For the branching process to have the desired properties, each individual must have probability 
1/2N of giving birth at time and probability (1 — s)/2N of dying at time Also, at most 
one birth or death event can occur at a time. Suppose the coupling holds at time ^t-i and 

1 G Lt-i- Also, assume Xt-i = k. In the population model, the number of -B's increases by one 
at time t, with i being the parent of the new individual, if It^2 = and Bt-i{It,i) = 0, which has 
probability (2A^ — k)/{2N)'^. Also, the ith individual in the population dies at time t, causing 
the B population to decrease in size by one, if It^i = i, Bt-i{It,2) = 0, and It^4 = 1, which has 
probability {2N - k){l - s)/{2N)^ . Consequently, we can define the branching process such that 
the individual labeled i gives birth at time if and only if It^2 = h which has probability 1/2N. 
We give the new individual the label It^i, unless one of the other individuals already has this 
label. As a result, the coupling will hold at time t if Bt-i{It,i) = but not if Bt-i{It,i) = 1. 
The individual labeled i will die with probability (1 — s)/2N, and will die whenever It^i = i, 



29 



Bt-i[It^2) = 0, and It^A = 1- Then, the probabihty that the couphng fails to hold at time t is 

,Y 1 ^^~M |/Y (^-") (2iV-fc)(l-.) \ kH2-s) 
\2N {2NY \ 2N {2NY J (2Af)2 ' ^"^^ 

If a new individual in the branching process is born at time t, we say that it has a new type 
whenever It^^ = 1, which has probability r. This means that births of individuals with new types 
correspond to recombinations in the population model. 

Fix a positive integer m. On the event that the branching process has at least J individuals at 
some time, we define a random marked partition as follows. Define k such that is the first 
time at which there are J individuals. Define a random injective map a : {1, . . . , m} such 
that all {J)m possible maps are equally likely. Then say that i j if and only if the individuals 
labeled a{i) and a{j) are of the same type. Mark the block of consisting of all i such that 
the individual labeled a{i) has the same type as the individual at time zero. Furthermore, 
we can define a such that a = a on the event that k = tj and Ltj = {i : BTj{i) = 1}, 
where a : {!,..., m} {i : BTj{i) = 1} is the map defined in the section 2 that is used 
in the construction of the random marked partition Recall that i j if and only if 

A'^j{a{{)) = yl°^(o-(j)), and the block {i : Bo{A'^j{cr{i))) = 1} is marked. 

Suppose Xt = J for some t and the coupling holds for all t < tj, so k = tj. Then, the 
genealogy of the branching process is the same as the genealogy of the B's in the population 
up to time tj. Furthermore, groups of individuals in the branching process with the same type 
correspond to groups of lineages in the population that escape the selective sweep at the same 
time, and therefore get their allele at the neutral site from the same ancestor. Therefore, we will 
have = *m unless one of the following happens to a sampled lineage during the first stage of 
the selective sweep: 

1. One of the B lineages experiences recombination, but the allele at the neutral site comes 
from another B individual. 

2. Two recombinations cause a lineage to go from the B population to the b population, and 
then back into the B population. 

3. There is a coalescence event involving at least one lineage in the b population. 

More formally, the lemma below is a consequence of our construction. Note that the events A3, 
A4, and Ag correspond to the three possibilities mentioned above. 

Lemma 6.1. Let Rj{i) = sup{t > : = 0} and Gj{i,j) = sup{t > : A^^(i) = 

^tj(i)}- ^6 ^^■"^ = *m on the event Ai PI • • • n A5, where 

Ai is the event that = J for some t, 

A2 is the event that the coupling holds for all t <tj, 

A3 is the event that for all t <tj for which Bt-i{It^2) = 1; have Bt-i{It,3) = 0, 

A4 is the event that for i G {!,... ,m}, we have Bt(A*^j(a{i))) = for all t < Rj{i), and 

A5 is the event that for all i,j G {!,..., m} with Gj{a{i), cr(j)) > 0, we have 
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Proof. We have seen that when Ai and A2 occur, we have L^j = {i : B-rj{i) = 1} and a = a. For 
integers u < t and i £ Lt, let Af{i) be the label of the individual in the branching process at time 

that is the ancestor of the individual labeled i at time .^t, unless the ancestor is of a different 
type then the individual labeled i at time t, in which case we define Af{i) = 0. Note that when 
Ai and A2 occur, we have i ~^ j if and only if A^^j{a{i)) = A^^j{a{j)) / for some t. 

Since a = a when Ai and A2 occur, we have i j if and only if A^^j{a{i)) = A^^j{a{j)) ^ 
for some t. Suppose j € Lj. It follows from the constructions that Al~^(j) = Al~^(j) unless 
j = It^i and /t^5 = 1. In this case, Al~^{j) = 0, and if A3 occurs then Bt-i{Al~^ (j)) = 0. 
It follows that if A4 also occurs, then A*^((T(i)) = ^^^((T(j)) 7^ if and only if we have both 
A*^j{a{i)) = A*^j{a{j)) and i?f(A*^((T(z))) = Bt{A^^j{a{j))) = 1. Furthermore, when A5 occurs, 
we have both A*^j{a{i)) = A^^j{a{j)) and Bt{A^^j{a{i))) = Bt{A''^j{a{j))) = 1 for some t if and 
only if A^j{a{i)) = A^j{a{j)), which is exactly the condition for i j. Thus, when Ai, . . . , A5 
all occur, we have i ~<i'„ j if and only if i j- 

It remains only to show that the marked blocks of and are the same. Note that i is 
in the marked block of if and only if a{i) = has the same type as the individual at time 
zero or, equivalently, if and only if A^j{a{i)) 7^ 0. The fact that this condition is equivalent to 
BQ{A^j{a{i))) = 1 follows from the coupling and conditions A3 and A4. □ 

We now use this coupling to show that the partition conditioned on the survival of the 
branching process has almost the same distribution as 

Lemma 6.2. Let tt be a partition of {1, . . . ,m}. Then, there exists a constant C such that 
\P'{i!m = vr|#Lt > for all i G N) - P{^^ = 7r)| < C/{\ogNf. 

Proof. We will show that if Ai occurs, then A2n- • -PlAs occurs with high probability. Conditional 
on the event that Xt-i = k and that the coupling holds at time t — 1, it follows from (|6.1|) 
that the probability that the coupling fails to hold at time t is k'^{2 — s)/{2N)'^. Likewise, 
conditional on these same events, the probability that Bt-i{It,2) = Bt-i{It,3) = 1 is {k/2N)'^. 
Thus, if Dt is the event that t is the first integer such that either the coupling fails at time t or 
Bt-i{It,2) = Bt-i{It,3) = 1, then P'{Dt\Xt = k) < {3 - s)k'^/{2Nf, where we use P' because we 
are not conditioning on the event that Xt = 2N for some t. Therefore, 

00 00 
P'(Ai n (A^2 U A^3)) < P'iDt n{t<Tj<^}) = Y, E'[P'{Dt n {t < rj < oo}|Xt_i)] 

3 - s 



t=i t=i 
00 



< 



t=i 
3 - s 



-X]^' ^ (2iV)2* ^^{Xt-i<J} - ^i^JqY^E'\^t-i'^{Xt-i<J}\ 
J 



00 



t=i 



\2 



Since P'{Xt ^ Xt-i\Xt-i = k) = P{Xt ^ Xt-i\Xt-i = k) = pk = k{2N - k){2 - s)/{2Nf and 
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E'pk + Dk] < C, it follows that 

J 



^ C ' k\2Nf ^ ' k ^ CJ' 

- A^2 Z-^ fc(2A^ -k) - ^^2N -k - N ' 

To handle A4 and A5, note that 

P'iXr = 2iV|Ai) = p{0, 2N, J) = > 1 - (1 - sy. (6.2) 

It follows from and the proof of Proposition HI] that P'(Ai n A|) < C/(log A^)^. Likewise, 
it follows from ^^7^i and the proof of Proposition that P'(Ai n Ag) < C(log A^)/A^. 

Since P'(Ai) = s/{l — (1 — s)"^) by Lemma l3.ll it follows from the above calculations that 
|P'(Ain---nA5)-s| < C/(logA^)2. Recahthat P'{Xr = 2N) = s/(l - (1 - s)^^) by Lemma O 
Since {i^Lt > for all t G N} is the event that the branching process survives, it is well-known 
that P'{^Lt > for all t G N) = s. Furthermore, if Ai n • • • H A5 occurs, then Xt = J for some 
t and #Lt = J for some t. Note that P'{Xr = 2N\Xt = J for some t) > 1 - (1 - sY as in (jHHJ) 
and P'{#Lt > for all t\#Lt = J for some i) = 1 - (1 - sY . Thus, the events Ai n • • • n A5, 
{Xt = 2N}, and {H^Lt = for all i} agree closely enough that the probability, under P', that 
either all or none of these three events occurs is at least 1 — C/(logA^)2. It follows from this 
observation. Lemma l6.ll and the fact that P is the conditional probability measure of P' given 
Xr = 2N that 

P'(^^ = 7r|#Lt > for ah i G N) = P'i^lm = A^i n • • • n A5) + 0((log N^^) 

= P'i^rr. = 7r| Ai n • • • n As) + 0((log N)'^) 
= P'i^rr. = 7r\Xr = 2N) + 0((log iV)-2) 
= P(^^ = 7r) + 0((logAf)-2), 

which proves the lemma. □ 



6.2 Infinite lines of descent 

Consider a continuous-time branching process in which each individual gives birth at rate 1 and 
dies at rate 1 — s. Equivalently, each individual lives for an exponentially distributed time with 
mean 1/(2 — s) , and then has some number of offspring, which is with probability (1 — s)/(2 — s) 
and 2 with probability 1/(2 — s). Say that an individual at time t has an infinite line of descent if 
it has a descendant in the population at time u for all u > t. Otherwise, say that the individual 
has a finite line of descent. 

Define the process (^^^\ Y^'^^)t>o such that Y^^^ is the number of individuals at time t having 

(2T 

an infinite line of descent and is the number of individuals having a finite line of descent. 
Gadag and Rajarshi (1992) show that this process is a two-type Markov branching process. They 
also show that the behavior of the process can be described as follows. Let pk be the probability 
that an individual has k offspring and let /(x) = "^^LoPk^^ be the generating function of the 



32 



offspring distribution. Let u{x) = b[f{x) — x], where 6 ^ is the mean hfetime of an individual. Let 
y) = X^jlo YlT=oPfk ^''y^ ^ where p^^j} is the probabiUty that an individual with an infinite 
line of descent has j offspring with an infinite line of descent and k offspring with a finite line 
of descent. Let f^^\x,y) = Yl'jLoYlkLoP^^k^''y'^^ where pj^^ is the probability that an individual 
with a finite line of descent has j offspring with an infinite line of descent and k offspring with a 
finite line of descent. Let u^^\x,y) = b[f^^^x,y) — x], and let u^^\x,y) = b[f^\x,y)-y]. Let q 
be the smallest nonnegative solution of the equation u{x) = 0, which is also the probability that 
the branching process dies out. Then, by equation (4) of Gadag and Rajarshi (1992), 

nW(x,y) = ^(^^^-^) + ^^)-^(^^\ and u^'\x,y) = ^. 

l-q q 

In the case of interest to us, we have f{x) = + 2r^^^) ^-^id therefore 

u{x) = (2 - s)[f{x) -x] = {l-s)+x^ -{2- s)x. 

Since u{x) = x if and only if x G {1 — s, 1}, we have q = 1 — s. It follows that 

u'-^\x,y) 



[xs + yil - s)]2 - (2 - s)[xs + y(l - s)] - [y(l - s)]^ + (2 - - s)] 



= sx^ + 2(1 — s)xy — (2 — s)x. 

Thus, an individual with an infinite line of descent lives for an exponentially distributed time 
with mean 1/(2 — s). It is replaced by two individuals with infinite lines of descent at rate s, 
and it is replaced by one individual with an infinite line of descent and another individual with 
a finite line of descent at rate 2(1 — s). 

Now, consider the process (y/^\y/^^) started with one individual and conditioned to survive 
forever, which is equivalent to assuming that Y^^^ = 1 and Y^"^^ = 0. Assume, as in Proposition 
12.71 that the individuals are assigned types, and that each new individual born is the same type 
as its parent with probability 1 — r and is a new type with probability r. Define A* = inf{t : 
y/^) = [Js\}. Let Afc = inf{t : y}^^ + y}^^ = k}. Let Ji = + s'^y/ilog J)/J)~^\ and 

J2 = \ Jil-s-'y' {log j)/jy']. 

Lemma 6.3. We have 1 - P{Xj, < A* < AjJ < C/{logNf. 

Proof. If S has a binomial(n,p) distribution and p < c < 1, then we have the large deviations 
result that P{S > cn) < e"^"^^"^) (see Johnson, Kotz, and Kemp (1992)). 

Let 5*1 have a binomial ( Ji, s) distribution, and let 5*2 have a binomial( -s) distribution. Let 
c = s + (log J) / J. Then Ji = [Js/cJ , so cJi < Js and therefore 

F(A- < A,,.) = P(5, > IJsWS, > 0) = E^Elpm < nS.-. VcJ,p ^ P(S,>(c-i;W 

^ - V i-L JI i ; P{Si>0) - l-{l-sy^ - sY^ 

Recalling J = [(log A^)"J with a > 4, it follows that if e > is small, then for large N 



P{X* < AjJ < 2e-2-^i(V(i°g-^)/-^--^r')' < Ce-^WJ)iogJ < c7J-(2-) < C/(logiV)8. 
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Likewise, if d = (1 — s) + \J (log J)/ J , then J2 = \Js/{\ — d)], so (1 — d)J2 > Js and thus 
P(A* > AjJ = P(52 < [Js\\S2 > 0) < P(52 < LJsJ) 

= P{J2 - 52 > J2 - L^sJ) < P{J2 - 52 > (iJ2). 

Therefore, P(A* > Aj^) < e-2(-^2/J)iog J < j-2 < (7/(102 iV)8, and the lemma follows. □ 
6.3 Proof of Proposition 1^771 

We now prove Proposition 12.71 Recall that is the marked partition obtained by sampling m 
of the [JsJ individuals at time A* having an infinite line of descent and then declaring i and j 
to be in the same block of if and only if the ith. and j'th individuals in the sample have the 
same type. The marked block of consists of the individuals in the sample with the same type 
as the individual at time zero. We now define three other random marked partitions 
and Tm in the same way, except that the sample of m individuals is taken differently for each 
partition. Namely, to obtain we sample m of the individuals at time Aj. To get Tm\ we 

(3) 

sample m of the individuals at time To get Tm , we sample m of the individuals at time Ajj 
that have an infinite line of descent, assuming that m such individuals exist (otherwise, sample 
from all individuals at time Ajj). 

Since the branching process has been conditioned to survive forever, Tm"* has the same dis- 
tribution as the conditional distribution of given > for all t G N. Thus, by Lemma 
.2\ it suffices to show that for all marked partitions vr G we have 

|P(T«=^)-P(T^ = ^)|< ^ 



(log NY ■ 

Note also that T^m and have the same distribution by the strong Markov property. 

We can couple T^^ and such that the sample at time A j used to construct includes 
all of the the individuals in the sample at time Ajj that were born before time Aj. If there 
are fewer than m such individuals, the rest of the sample at time Aj can be picked from the 
remaining individuals. By the strong Markov property, this way of picking the sample at time 

Aj does not change the distribution of T^^- Therefore, T^^ = if the m individuals sampled 

(2) 

when constructing Tm were all born before time Aj. Likewise, we can couple the partitions 

(3) 

Tm and Tm such that on the event A* < Ajj, all of the individuals sampled at time that 
were born before time A* are part of the sample at time A* used to construct T^- Note that 
A* is a stopping time with respect to the process (y/^'*, l^''^^)j>o, so the strong Markov property 
implies that, conditional on (y/^\ Y'/^^)o<t<A* , all (^!^-') m-tuples of individuals with an infinite 
line of descent at time A* are equally likely to form the sample used to construct T^- With this 
coupling, Tm = f m if A* < and all individuals sampled when constructing were born 
before time A*. 

Since Tm"* =d Proposition 12.71 will be proved if the couplings described in the previous 
paragraph work well enough that P{t'^ ^ T^) and P{T^m / '^ni) can both be bounded by 
C/(logA^)^. These bounds follow from Lemma 16.31 and Lemma 16.51 below. 

Lemma 6.4. Let (^^4)^0 ^ random walk on 7L such that = 1 and, for all k, P{^'f^i = 
k + = k) = 1/(2 - s) and P{^'t+i = k- l\Ct = k) = {I - s)/{2 - s). Let ^ = (^t)~o be the 
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Markov process whose law is the same as the conditional law of (COt^o 9'^ven > 1 for all t. Let 
Kn = inf{t : = For all positive integers n, we have E[Kn+i — i^n\ ^ (2 — s)/ s. 

Proof. Note that ki = and H2 = 1. Therefore, E[k2 — ki] = 1. Suppose £'[k„ — < (2 — s)/s. 
Let Dn = #{t : Kn < t < = n, and = n — 1} be the number of times that ^ goes 

from n to n — 1 before hitting n + 1. Since In = P{it = n + l\it-i = n) > 1/(2 — s), we 
have that -D„ + 1 follows a geometric distribution with parameter > 1/(2 — s). Therefore, 
E[Dn] = (1/^n) — 1 < 1 — s. Note that each time that ^ goes from n to n — 1, it must eventually 
return to n, which takes expected time E[Kjn ~ Hn—i]- Thus, E[Kn+i — Kn ] = l + E[Dn]{l + E[ 

Kn 

Kn-i]) < 1 + (1 — s)[l + (2 — s)/s] = (2 — s)/s. The lemma now follows by induction. □ 

Lemma 6.5. The probability that an individual chosen at random at time Xj^ was born after A 
is at most C/{logN)'^. 

Proof. Define {Yt)^Q such that if = tq < ri < . . . are the jump times of {Y^^^ + Y^'^^)t>o, then 
Yt = YrP + Yr^\ Let Afc = inf{t : Yt = k}. The number of births between Ajj and Xj^ is at most 
Aj2 - Aji. We have E[Xj^ - AjJ < [(2 - s)/s]{J2 - Ji) by Lemma lOl Note that 



J2-J1 ^ J(l - S-^y^l^^jyj)-^ - J(l + S- V(l0g J)/J)-^ + 2 ^ ^ /b^ 



J2 - j{i + s-^^{iogj)/jy^ - V J ' 

so the probability that a randomly-chosen individual at time Ajj was born after Aj^ is at most 



logJ< C 



J2 J ~ V J - (logAf)2' 
where the last inequality holds because J = [(log A^)"J for some a > 4. □ 



7 Approximating the distribution of G 

In this section, we complete the proof of Theorem 11.21 by proving Propositions I2.1U1 12.111 and 
12.81 We will use the notation Wk, Ck, Yk, and Zi introduced before the statement of Theorem 1.2 
in the introduction. Recall also that L = [2A^sJ . 

In subsection 7.1, we prove Propositions 12. lOl and ITTTl which pertain to the random variables 
Zi introduced in the paintbox construction given in the introduction. The rest of the section 
is devoted to the proof of Proposition 12.81 In subsection 7.2, we introduce random variables Z- 
using the branching process. In subsection 7.3, we state some lemmas comparing the Zi and 
and explain how these lemmas imply Proposition 12.81 In subsection 7.4, we present some results 
related to Polya urns that are needed to prove these lemmas, and finally the lemmas are proved 
in subsection 7.5. 



7.1 Proofs of Propositions 12.101 and 12.111 

Proof of Pronosition Wm Since P{Zi = Z2 = k\Vk) < V^, we have P{Zi = Z2 = k) < E[V^] = 
E[ClWi\ = E[Cl]E[Wl]. Since E[Cl] = E[Ck] = r/s and E[Wl] = 2/k{k + 1), it follows that 

^ 2r 2r C 
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We next prove Proposition 12 . 1 ll which says that the distribution of the number of i such that 
Zi > [Js\ is approximately binomial. We begin with a lemma which gives an approximation to 
P{Z^ > lJs\). 

Lemma 7.1. P{Zi > [Js\) = qj + 0{l/{logNf) . 



Proof. By the construction in the introduction, P{Zi = k\Zi < k) = E[Vk] = E[(^k]E[Wk] = r/sk. 
Therefore, P{Zi < [Js\) = Y[k=lJs\+i(^ ~ f/^k). This is the same as the probability that none 
of the events ^i^j^j+i, . . . , Al occurs if the events are independent and P{Ak) = r/sk. Since 



L 

E 

fe=[JsJ+l 



r 

sk 



< 



< 



C 



s\Js\ - (logiV)6' 



it follows from the Poisson approximation result on p. 140 of Durrett (1996) that 

L 

P{Zi > lJs\) = 1 - exp ' 



k=l,Js\+l 



1 



{log NY 



If 1 < yi< y2, then < ztLi I 



logm < 2/[yi\. Therefore, 



2N ^ l2Nsi ^ 



k=J+l 



A;=[JsJ+l 



1 

^7 + 



yi 

2N 



V^l 1 ^2iV\ ^ f2Ns\ 



k=J 

2 



k 

< 



[2Ns\ ^ 

^ k 

fc=[JsJ 



c 



J [Js\ - (logiV)''' 



It follows that 

P{Z, > [Js\) = 1-exp 



2N 



k=J+l 



E iho 



1 



{log NY 



qj + 



1 



(log NY 



□ 



Proof of Provosition \2. 1 li Let % = : Zi = k}. Then D = rjyg^j^i + ••• + r/z,. Define 

the sequence (77fe)^^^jj,j_|^^ such that fjL has a Binomial(n, r/sL) distribution and, conditional on 
f/fc+i, . . . ,f/L, the distribution of fjk is Binomial with parameters n — fjk+i — ■ ■ ■ — fjL and r/sk. 
Thinking of flipping n coins and continuing to flip those that don't show tails, it is easy to 
see that D = r^i^j^j+i + ■ ■ ■ + fji has a binomial distribution with parameters n and 7, where 
7 = P{Zi > [Js\). To compare D and D we note that 



P{Vk > 2|??fc+i,...,??L) < 



n 



n 



2r 



2 J sk{k + l) 



and P(??fc > 2|f/fc+i . . . ,1)1) < (^{r / sk)'^ . By Lemma I^TI we can couple the r/fc and i)k such that 
-f / Vk\''li = fji iov I = k + I, . . . , L) < Cr/k'^ for all k. Therefore, 



v-^ Cr Cr 

P{Vk + m for some k > [Js\) < T2 - TTT - 



C 



k=lJs\+l 

This result, combined with Lemma l7. 11 gives the proposition 



A;2 - [Js\ - {log NY 



□ 
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7.2 Random variables Z[ from the branching process 

It remains only to prove Proposition l2.81 which requires considerably more work. For convenience, 
let H = [Js\. ^From this point forward, Zi, . . . , Z„ will be random variables defined as in the 
introduction but with L = H, so that the associated marked partition H has the distribution 
Qr,s,H- Our goal is to describe the distribution of the marked partition T„ from Propositions 12 . 7l 
and 12.81 using random variables Z[, . . . , Z'^, where Z'^ will be the number of individuals with an 
infinite line of descent at the time when the type of the ith individual first appeared. We will 
then prove Proposition 12.81 bv comparing the distribution of {Z[, . . . , Z^) to the distribution of 

(Zi, . . . , Zn). 

Define times = 71 < 72 < • • • < ^yu such that 7j = inf{t : Y^"^"^ = j} is the first time that 
the branching process has j individuals with an infinite line of descent. Note that (7^+1 — 7j)^^^ 
is a sequence of independent random variables, and the distribution of 7j+i — 7j is exponential 
with rate js. Whenever a new individual with an infinite line of descent is born, it has a new 
type with probability r. Also, each individual with an infinite line of descent is giving birth to 
a new individual with a finite line of descent at rate 2(1 — s). Since a new individual has a new 
type with probability r, between times 7^ and 7j+i, births of individuals with new types occur 
at rate 2jr{\ — s). Whenever such a birth occurs, the type of the individual with an infinite 
line of descent changes with probability 1/2. Thus, between times 7j and 7j+i, we can view the 
branching process as consisting of j lineages with infinite lines of descent, and their types are 
changing at rate r(l — s). It follows that if, for some j > 1, we choose at random one of the 
j individuals at time 7j+i— with an infinite line of descent, the probability that its ancestor at 
time 7j is not of the same type is 

. (7.1) 

r{l-s)+js 

Furthermore, for j > 2, the probability that its ancestor at time 7j is not of the same type as its 
ancestor at time 7^— is r/j because, with probability r, exactly one of the individuals at time 
7j is of a type that did not exist at time 7j — . It follows that for j > 2, the probability that the 
individual sampled at time 7j+i— has a different type from its ancestor at time 7^— is 

r(l — s) is f r\ r r ,„ , 



r{l — s)+js r{l — s)+js\jj r{l — s)+js js 

Likewise, the probability that at least one of the j individuals with an infinite line of descent at 
time 7j+i— has a different ancestor at time 7j— is 

r(l-s) ^ s r 



r(l — s) + s r(l — s) + s r(l — s) + s 

Let (t'(1), . . . ,cr'{n) represent n individuals sampled at random from those with an infinite 
line of descent at time 'Jh- Then we can take the partition T„ to be defined such that i ~x„ j 
if and only if cr'{i) and <t'(j) have the same type, and the marked block is {i : CT'{i) has the same 
type as the individual at time 0}. Now define Z[, . . . ,Z^ as follows. Let Z'- = 1 if the ancestor at 
time of cr'{i) has the same type as (T'{i). Otherwise, define 

Zj' = max{A; : cr'{i) has a different type from its ancestor at time 7fc — }■ 
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If Z[ 7^ Zj, then since each new type is different from all types previously in the population, 
o' {i) and cr'(j) have different types. If Z[ = Z'-, then and have the same type unless 
o"'(i) and cr'(j) have different ancestors at time ^z'+i~ because they both have the same type 
as their ancestor at time ^z'+i~- We will show in Lemma 17.21 below that the probability that 
Z[ = Z'- and o'ii) and <T'(j) have different ancestors at time 7z'+i— is 0((log A^)~^). Therefore, 
the probability that, for some i and j, we have Z[ = Z'- but (J {i) and (T'{j) have different types 
is 0((log A^)^^). Furthermore, it follows from (|7.1j) that the individuals {(T'{i) ■ Z[ = 1} have 
the same type as the individual at time with probability s/(r(l — s) + s). Define the marked 
partition of {1, . . . ,n} such that i ~x'^ j if and only if Z[ = Zj, and independently with 
probability s/(r(l — s) + s), mark the block {i : Z[ = 1}. The preceding discussion implies that 

|P(T„ = vr) - P{r^ = vr)! < — ^ (7.3) 

(log A*)^ 

for all vr G Vn- Thus, for proving Proposition 12.81 we may consider instead of T„. This will 
be convenient because is defined from Z[, . . . , Z'j^ in the same way that 11 is defined from 
Zi, . . . , Zn- Consequently, once we establish Lemma [7.21 below, the remainder of the proof of 
Proposition 12.81 will just involve comparing the Zi and Z'-. 

Lemma 7.2. If i ^ j then 

(J 

P{Z[ = Z'j and cr'{i) and cr'(j) have different ancestors at time 'yz'+i~) — — Jifj2' C''-'^) 



Proof. First note that if Z- = Z'j = k, then cr'{i) and cr'{j) have the same type as their ancestor 
at time 7^+1 — . If they have different ancestors at time 7^+1 — , there must be a 7 G (7^,7^+1) 
such that either <T'(i) or cr'{j) has an ancestor of a different type at time 7— but not at time 7. 
The other of cr'{i) and (T'{j) must have an ancestor of a different type at time 7^— than at time 
7—. Given that o"'(i) and cr'(i) have different ancestors at time 7^+1 — , the probability that both 
of these things happen if A; > 2 is 

/ 2r(l -s) \f r \ 2r2 

\2r{l-s) + ksJ \r{l - s) + ks ) " fcV' 

The first factor is the probability that o"'(i) or o"'(j) has an ancestor of a different type at some 
time 7—, while the second factor is the probability from 1)7.21) that the other of cr'{i) and cr'(j) 
has an ancestor of a different type at time 7^— than at time 7—. If A; = 1, then this conditional 
probability becomes 

/ 2r(l - s) \ / r{l-s) \ 2r'^ 
\2r{l - s) + ks) \r{l - s) + ks) ~ k'^s'^ 

by <|7.1|) . Therefore, if i 7^ j, the probability that Z[ = Z'- and (t'(«) and o"'(j) have different 

ancestors at time ^z'+i~ is at most 'Yl,k=i Ws^ — ^/(i°S-^)^; ^ claimed. □ 
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7.3 Comparison of the Zi and and proof of Proposition 12.81 



We first prove two fairly straiglitforward lemmas, one for the Zi and one for the Z[. Lemma 17.31 
allows us to disregard the possibility that the Z[ may take more than two distinct values greater 
than one, as well as the possibility that there may be two distinct values greater than one, with 
multiple occurrences of the higher value. Lemma 17.41 rules out the same possibilities for the Zi. 

Lemma 7.3. 

P{Z\ = j,Z'^ = k,Z!, = l for some 2 < j < k < I) < ^^"g.^ , (7.5) 

(log Ny 

P{Z[ = j, Z'^ = Z'^ = k for some 2 < j < k) < (7.6) 

Proof ^From we get P(Z^ = < r/sl, P{Z!^ = k\Z'^ = I) < r/sk, and P{Z[ = j\Z^ = 

k, Z^ = I) < r/sj. Thus, the probability on the left-hand side of 1)7. 5|) is at most 

^ ^ ^ ^'''\fi"\ff\^ C* (log (log A^))^ 



EEE 

j=l k=j l=k 



Is J \ks J \js J (log A^) 



Conditional on the event that cr'(2) and cr'(3) have different ancestors at time 7^+1 — , the 
probability that they have the same ancestor at time 7™,— is (^) = 2/m(m — 1). There- 
fore, the probability that (t'(2) and o"'(3) have the same ancestor at time 7^+1— is at most 
'^m=k+i 2/m{m — 1) < 2/k. The probability that Z2 = Z'^ = k given that a'{2) and a'{2>) have 
the same ancestor at time 7^+1— is at most r/ks. Also, for j < k, we have P{Z[ = j\Z'2 = 
Z^ = k) < r/js. Combining these results with Lemma 17.21 we can bound the probability on the 
left-hand side of (|7.6j) by 

(log iV)2 + [js)[ks)[k) - (log Ar)2 + s2 ^ jk^ - (log ° 

Lemma 7.4. 

C(log(logAf))3 

P{Zi = j, Z2 = k,Z3 = l for some 2 < j < k < I) < ^ ^ 



P{Zi = j, Z2 = Z3 = k for some 2 < j < k) < (7.7) 



(log Nf 

C_ 

(log"iV)2 



Proof Fix j, k, I such that 2 < j < k < I < H. We have ^(Zg = < /) = ^, P{Z2 = k\Z3 = 
I, Z2 < k) = j^, P{Zi = j\Z2 = k,Z^ = /, Zi < j) = jj, and hence 

P(z,=,,z. = .,Z3 = o<(^)(^)(^). 

Summing as in the proof of Lemma 17.31 gives the first result. To prove (|7.7() . first note that 

P{Z2 = Z, = k)< E[V,'] = E[C!]E[Wi] = 
and P{Zi = j\Z2 = Z3 = k) < r/sj, then compute as in the proof of Lemma 17.31 □ 
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Throughout the rest of this section, we will use the notation 

{k - l)a\in - a + k - 2)1 



Ik, 



{n + k-l)\ 



We now state four more lemmas related to the Zi and Z'-. Their proofs will be given after we 
explain how they imply Proposition 12.81 

Lemma 7.5. Suppose 1 < a < n — 1. Then 

P{Z[ = = • • • = Z^+i = kX+2 = ■■■ = Zl, = l for some 2 < k < I) 

^ ^ 'lk,a,n-l 1 



k=2l=k+l \v B ; / 

Lemma 7.6. Suppose 1 < a < n — 1. Then 

P{Zi = /, Z2 = • • • = Za+i = k,Za+2 = ■ ■ ■ = Zn = 1 for some 2 < k < I) 

2 H H , ^ 

k=2l=k+l & J 

Lemma 7.7. If 2 < a < n, then 

P{Z[ = ■■■ = Z'^ = k and Z'^^^ = ■ ■ ■ = Z'^ = 1 for some k > 2) 

k=2 k=2l=k+l \V & J / 

H 



P{Z[ = k and Zn = ■ ■ ■ = Z' = I for some k >2) = - qk in 



k=2 

2 H H , 2 H k-l ^ / 1 \ 

s2 I g2 Z^Z^ Un^i_2)^^\nogNy J' 

k=2l=k+l k=2 1=2 ^ ' \\ t, J / 

Lemma 7.8. If 2 < a < n, then 

P{Zi = ■ ■ ■ = Za = k and Za+i = ■ ■ ■ = Zn = 1 for some k > 2) 

k=2 k=2l=k+l VV 6 ; / 

H 



P{Zi = k and Z2 = ■ ■ ■ = Zn = 1 for some k >2) = - qk^i 



k=2 



2 H H , 2 H k-1 ^ / 1 \ 

nr_y y g^_(n-l)r 1 1 \ 
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Proof of Provosition W^ Let vr G Vn- If ti" has four or more blocks, or three blocks of size at 
least two, then P{V^ = vr) < C/{\ogNf by Lemma O and Qr,s,H{'r^) < C/{logN)^ by Lemma 
17.41 If vr has three blocks, at least one containing just one integer, then the fact that \P(T'^ = 
vr) — Qr,s,Hi'^)\ < C/(log A^)^ follows from Lemmas 17.31 17.41 17.51 and 17.61 as well as the fact that 
the probabilities that the blocks {i : Zi = 1} and {i : Z'- = 1} are marked in the two partitions are 
both s/{r{l-s) + s). If vr has just two blocks, then \P{T^ = vr) - Qr,s,H(vr)| < C/{\ogNf follows 
from Lemmas 17.71 and 17.81 Lemmas 17.51 and 17.61 with a = n — 1, and equations 1)7. 6(1 and 1)7. 7|) . 
Finally, when vr has just one block, \P{T'^ = vr) — Qr,s,H{'^)\ ^ C/(log-/V)^ follows from Lemmas 
O and El with a = n, and the fact that P{Zi = • • • = Z„ = 1) and P{Z[ = • • • = z; = 1) 
can be obtained by subtracting from one the remaining possibilities. Proposition 12 . 81 now follows 
from these results and (|7.3|) . □ 

7.4 Polya urn facts 

It remains to prove Lemmas 17.51 17.61 17.71 and 17.81 In this subsection, we establish three lemmas 
that are related to Polya urns. The first two lemmas are standard and straightforward, and their 
proofs are omitted. 

Lemma 7.9. Suppose X has a beta distribution with parameters 1 and k — 1, where k is an 
integer. Let Ui, . . . ,Un be i.i.d. random variables with a uniform distribution on [0, 1]. Then 

P{Ui < X for i = 1, . . . ,a and Ui > X for i = a + 1, . . . ,n) = qk,a,n- 



Lemma 7.10. Consider an urn with one red ball and k — 1 black balls. Suppose that n new balls 
are added to the urn one at a time. Each new ball is either red or black, and the probability that 
a given ball is red is equal to the fraction of red balls currently in the urn. Let S be any a-element 
subset of {1, ... ,n}. The probability that the ith ball added is red for i £ S and black for i ^ S is 
Qk,a,n- Note that this implies the sequence of draws is exchangeable. 

Lemma 7.11. In the setting of Lemma \7.im suppose instead I — k new balls are added to the 
urn. Then, suppose we sample n of the I balls at random. Let pk,i,a,n be the probability that the 
first a balls sampled are red and the next n — a are black. If a > 1, then there exists a constant 
C, which may depend on a and n, such that \pk,i,a,n ~ Qk,a,n\ ^ C/kl for all k and I. 



Proof. It follows from Lemma [7. 101 that, conditional on the event that none of the original k balls 
is in the sample of n, the probability that the first a balls sampled are red and the next n — a 
are black is exactly qk,a,n- The probability that the sample of n balls contains exactly j of the 
original k balls, an event we call Dj ^, is 



< ( K) f^^l < h < C(^) , (7.12) 



Q ~ VjV V j\ n J - \jj 11 --^yi 

since n is a constant and thus so are a < n and j < n. 

Conditional on the event Dj/., we can calculate the probability that we sample a red balls 
and n — a black balls. The probability that the original red ball is in the sample is j/k. If it 
is, then by Lemma 17.1U1 the probability that a — 1 of the other balls in the sample are red is 



41 



i^-i) Qk,a-i,n-j ■ Likewise, conditional on the event that the original red ball is not in the sample, 
the probability that a of the other balls in the sample are red is {"'~'')qk,a,n-j- Thus, conditional 
on the probability that we sample a red balls and n — a black balls is 

j {n-j)\{k - l){n-j - a + k - 1)1 - j {n - j)\{k - l){n - j - a + k - 2)1 
k (n — j — a + iy.{n — j + k — 1)\ k (n — j — a)!(n — j + — 1)! 

Our next step is to bring (^)qk,a,n out in front. Using that (m — j)! = m\/{m)j for integers 
1 < j < m, we get, for A; > 3, 

n\ ^1 {k-l){n- a + k-2)\ 



a) (n + k-l)\ 

(n — a)j-i j {n + k — l)j ^ (n — a)j k — j [n + k — l)j 



{n)j k (n — a + k — 2)j^i {n)j k {n — a + k — 2)j 



(7.13) 



Consider the expression in brackets. Each term can be written as a ratio of two polynomials in 
k of the same degree. Since a < n and j < n, if /c — > oo with n fixed, the expression in brackets 
is bounded by a constant. Now, suppose a = 1. The bracketed expression becomes 

j{n + k - l){n + k - 2) {n - j){k - j){n + k-l){n + k-2) 

nk{n + k — j — 1) nk{n + k — j — l){n + k — j — 2) 

_ j{n + k - l){n + k - 2){n + k - j - 2) + {n - j){k - j){n + k - l){n + k - 2) 

nk{n + k — j — l){n + k — j — 2) 

Both the numerator and denominator of this fraction can be written as third-degree polynomials 
in k whose leading term is nk^. Consequently, this fraction minus 1 can be written as a second- 
degree polynomial in k divided by a third-degree polynomial in k, which can be bounded by Ck~^ 
for some constant C. 
Note that 

_ {k - l)a\{n - a + k - 2)\ a\{n-a + k-2y. _ a\ C_ 

~ (n + /t-l)! - {n + k-2)\ ~ {n + k- 2)„ " ^ ' 

To compare Pk^i^a^n and Qk^a^n when ci ^ 2, we will break up the probability Pk^i^a^n by conditioning 
on the number of the original k balls that were sampled. Conditional on sampling j > 1 of the 
original k balls, the probability that the first a balls sampled are red and the next n — a are black 
is (") times the probability in (|7.13j) . which can be bounded by Cqk,a,n- The probability of 
sampling j of the original k balls is at most C{k/iy by 1)7. 12() . so 

\Pk,l,a,n - qk,a,n\ < <^ ( T ) Qk,a,n < Ck''^ \l) ^ '^^""^ ' 1 ' W 

Finally, when a = 1, we have 

, ^^(kV C ^^(k\\n C 
i=i ^ ^ i=i ^ ^ 
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7.5 Proofs of Lemmas EH \7M EUl and HH 

Proof of Lemma \7. 5\ For 2 < A; < Z, let A'I'^ be the event that cr'(l), . . . , cr'(n) ah have distinct 
ancestors at time . Let A2 be the event that the ancestor of cr'(l) at time 7;— has a 

different type from the ancestor of o"'(l) at time 7^+1— • Let A^'^ be the event that one of the k 
individuals at time 'jk+i— is the ancestor of cr'{2), . . . , a' {a + 1) but not a'{a + 2), . . . , a'{n), and 
let ^4'^ be the event that the ancestor of this individual at time has a different type. We 
claim that 

P{Z[ = I, Z2 = • • • = Z'^^i = k,Z'g^_^_2 = • • • = = 1 for some 2 < k < I) 

= p( U 4''n4''nyl^''nAf)+o(^^-J^). (7.15) 

First consider the probability that Z[ = l,Z2 = ■ ■ ■ = -^^+1 ~ ^ ™^ — ' ' ' — — 

1 for some 2 < k < I but that not all of A^'\ ^2 '' ^3'^ ^4'' occur for any /c and /. Note 
that this can only happen in two ways. One way would be for A^'^ not to hold, which would 
mean (t'(1), . . . , cr'(n) do not all have distinct ancestors at time 7;_(_i — . However, it follows from 
the argument used to prove that P{{A'1'^)'' n {Z[ = 1} n {Z^ = k} for some 2 < fc < is 
0((logA^)-2). The second way would be for A^^' to hold but for cr'(2), . . . ,a'{a + I) not all to 
have the same ancestor at time ^k+i — - It follows from Lemma 17.21 that this possibility also has 
probability 0((log iV)"^). 

Next, we consider the probability that A^'', ^2'^ ^3'') ^-iid A^'^ all hold, but we do not have 
Z[ = I, Z2 = ■ ■ ■ = -^a+i ~ — ' ' ' — ^'n — ^- This is only possible if there is a third 

time 7, other than the times between 7; and 7;+i and between 7^ and 7^+1, such that the type of 
the ancestor of one of the individuals cr'(l), . . . , a'{n) at time 7 is different from the type of the 
ancestor at time 7—. However, it is a consequence of ()7.5p that the probability that this occurs 
is at most 0((loglog A^)3/(logiV)3). It follows that (f7TT)|) holds. 

Recall from the proof of Lemma [721 that if two individuals with an infinite line of descent are 
chosen at random at time 7fc_|_i — , then the probability that they will have the same ancestor at 
time 7fc— is 2/k{k — 1). Since there are (2) pairs of individuals, we have 

We have P{Al'^\AY) = r/[r(l - s) + Is] by Next, note that ff we choose at random 

one of the k individuals between times 7^ and 7fc+i, then the probability that the individual 
born at time 7fc+i is a descendant of the randomly chosen individual is 1/A;, and thereafter 
the probability that each new individual is a descendant of the randomly chosen individual is 
the fraction of the current individuals that are descended from the randomly chosen individual. 
This is the same description as the urn problem of Lemma 17. IH so conditional on J^'\ the 
probability that cr'(2), . . . ,(T'(a + l) but not a'{a + 2), . . . ,a'{n) are descended from the randomly 
chosen individual is Pk,i,a,n-i- Therefore, PiA^' \A{ fl ) = ^Pk. ,i,a,n—i- By (|7.2() . we have 
^(A^'Vi nA2 n^3 ) = r/[r{l-s) + ks]. By the arguments used to prove ()7.5() . the probability 
that ^^''n^2''n^3''n holds for more than one pair {k,l) is at most 0((loglog A^)^/(log A^)^). 
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Thus, 



p( [J a'I'^ n A^/ n a''/ n aIA 

^2<k<l ' 
H H 



k=2l=k+l 



r(l — s) + Is J \r{l — s) + ks 



kr 



By Lemma [V. Ill we can write Pk,i,a,n-i = Qk,a,n-i + where \5\ < C/kl. Also, P{Ai) = 1 — r], 
where r] < C/l. Note that r/[r(l — s) + Is] < r/ls and kr/[r[l — s) + ks] < r/s. Recah from 
H7.14() that qk,a,n ^ C /k for all a > 1. To complete the proof, we will need to simplify the four 
factors inside the sum in ()7.16|) by obtaining four inequalities. First, note that 



r(l — s) +ls Is 



r2(l - s) 



< 



(r(l - s) + ls){ls) - Ps^' 



Therefore, 

H H 

EE 

k=2l=k+l 

Also, 

Therefore, 

H H 

EE 

k=2l=k+l 

Also, 



r(l — s) + Is Is 



H H 



kr 



r(l — s) +ks s 



k=2l=k+l 



r^{l-s) 



(logiV)S 



(7.17) 



< 



(r(l — s) + ks)s ks"^ 



kr 



r(l — s) + ks s 



^ ^ ^ k=2l=k+l ^ ^ 



log log N 



H H 
k=2l=k+l 



r\/C\ Cr 

ki ^ 



2 H H 



k=2l=k+l 



(logiV)2 



(7.18) 
(7.19) 



and 



E E (£) (l) {j) (1 - ^K')) <-^ttw^- 0{^) . (7.20) 

k=2l=k+l ^ / V / V / fc=2«=fc+l & ; / 



H H 



It follows from ¥Tl^ - ¥nM that 



p(U.-n4''n4'n.^').i:E(£)(3«..-.^o(^ 



•2<k<l ' k=2l=k+l 

which, combined with (|7.15|) . implies the lemma. 



□ 



Proof of Lemma \7.6\ Suppose 2 < k < I. Let B^' be the event that Zi < I for i = 1, . . . ,n. Let 
B2' be the event that Zi = I and Zi ^ I for all 2 < f < n. Let B^' be the event that Zi < k for 
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all 2 < i < n. Let be the event that Z2 



Let be the event that 



= k but Zj 7^ A; for a + 2 < i < n. 



1. Note that Zi = /,Z2 



and Za+2 = • • • = Z„ = 1 for some 2 < A; < Z if and only if, for some 2 < k < I, the event 

occurs. Furthermore, the events B^' D ■ ■ ■ Ci Br^' are disjoint for 
different values of k and /, so we need to calculate Ylk=2 E/=fc+i Pi^i n ^2 n S3 n S^ n S5 . 
We have 



H H 

p{B\^') = n ^[(1 - y^T] > n - ^^^1 ^ 1 - E ™^[^^] = 1 - ^ E ^- (7-21) 

j=i+i i=«+i ' ' ' ' '^'^ 



j=l+l 



By Lemma 17.91 



r / (/-l)(n + /-3)! 
s " ' ' s V + / - 1)! 

By the same reasoning used to get 1)7. 2ip . we have 



1(1-1) 



sl\{n + l- l){n + 1-2) 



r 

< —. 

- si 



P{B^'^\B^'^ nB^'^) > 1 



in 



By Lemma 17.91 



l-i 



j=k+i 



p{b'/\b1'' n B^^' n b'/) = -qk,a,n-i. 



Finally, by the argument used to establish 1)7. 21(1 and ()7.22|) . 



(7.22) 



k-l 



p{b^'^\b'['^ n b!^'^ n b^'^ n sf ) >i-(n-a-i)J];— . 



J=2 



(7.23) 



Note that the product of the probabilities on the right-hand side of 1)7. 21() . 1)7. 22(1 . and (|7.23p is 
at least 1-n Ylf=i > 1 - 'log ■ Since gfc.a.n-i < C/A; by ()7.14p . we have 



fc=2i=fc+l 

and so 



E E b u 



ClogiJ 
log iV 



< 



c 



(logiV)- 



^ ^ (logg) ^ C(logff)3 
2^ 2^ H - (logA^)3 ' 



k=2l=k+l 



P(sf n S2'' n S3'' n n s 



k,h 
5 J 



fc=2i=fc+l 



H H 

Y.Y. 

k=2l=k+l 



r \ { r 
7l)\s 



1(1-1) 



{n + l-l){n + l-2) 



Qk,a,n-l + O 



1 



(logA^)^ 



(7.24) 



Finally, note that 



1 _ '(f-i) 

(n+«~l)(n+«-2) 



< C// for some constant C. Since 9fc,a,n— 1 

< C/A; and 



EE ^ 



k=2l=k+l 



r \ / r\ C 



- < 



C7 



slj \sj kl - {logN)"^ 
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equation (|7.24|) remains true if the term in brackets is replaced by 1. The lemma follows. □ 

Proof of Lemma \7. 1\ Let A\ be the event that one of the k individuals at time 7^+1— is the 
ancestor of . . . , cr'(a) but not a'{a+ 1), . . . , a'{n), and let be the event that the ancestor 
of this individual at time 7^— has a different type. It follows from Lemma [7.2l that the probability 
that, for some k >2, we have Z[ = ■ ■ ■ = Z'^ = k and Z'^^^ = ■ ■ ■ = Z'^ = 1 but the event n A2 
does not occur is at most 0((log A^)~^). We will therefore calculate the probability that the event 
A^nA^n{Z[ = ■■■ = Z'^ = k}n {Z'^+i = ■■■ = Z'^ = l] occurs for some k>2. Note that this 
occurs for at most one value of k, so we may sum the probabilities over k = 2, . . . , H . 

Note that PiAf) = kpk,H,a,n and P{A^\A'l) = r/(r(l - s) + ks) by It follows that 

P{A^ n A2) = [kr/{r{l - s) + ks)]pk,H,a,n- Note that kr/{r{l - s) + ks) < r/s, and recall that 

,H,a,n ~ 9fc,a,n| ^ C/kH by Lemma 17. Ill Therefore, 

" ' kr \, , C7r^ 1 CrlogH C 

1 \Pk,H,a,n - qk,a,n\ < > , 777 < 77 < 



It follows that Ef=2^(^? n Al) = Zk=2 J gfc.a.n + O (l/(log iV)^) . AlsO, qk,a,n < C/k, 



SO 



Thus, 



I r(l— s)+fes 

\r{\-s)^ks ~~s) - X^ \k^ J 1 " ^V(logiV)^ 



k=2 k=2 W b ) / 

If A\ and ^2 both occur, then we will have Z[ = ■ ■ ■ = Z'^ = k and Z'^j^^ = ■ ■ ■ = Z'^ = 1 
unless either Z- = I for some i = 1, . . . ,n and I ^ {1, k}, or Z- = for some i > a + 1. By Lemma 
rni we have P(A5^ n n {Zi = k} for some A; > 2 and i > a + 1) < C/(log A^)^ Therefore, we 
only need to consider the possibility that Z- = I for some i = 1, . . . ,n and I ^ {l,k}. We will 
treat separately the cases / < k and I > k. Note that by (|7.5)) . the probability that A\ and ^2 
both occur, Z[ = li, and Zj = I2, where li and I2 are distinct integers not in is at most 

0((loglogiV)V(logA^)^). 

We first consider I > k. By (|7.6jl the probability that A^ and A2 both occur and Z- = Z'- = I 

for some i ^ j \s 0((log A)^^). By the same argument used to prove Lemma l7.5l the probability 
that A\ n ^42 for some k but Z[ = I for some Z > A; is 

k=2l=k+l \V 6 ) / 

There are two differences between this formula and the result of Lemma 17.51 which can be 

k I 

explained as follows. First, in place of the event A2 , we need the event that, for some i = 1, . . . , n, 
the ancestor of (y'{i) at time 7^— has a different type from the ancestor of (y'{i) at time 7;4_i — . 
This is why the double summation is multiplied by n. Second, instead of A^'\ we need one of 
the individuals at time 7^+1— to be the ancestor of . . . , a' {a) but not a' {a + 1), . . . , cr'{n), 

rather than cr'{2), . . . , a' {a + 1) but not a' {a + 2), . . . , a'{n). This is why we have qk,a,n in the 
formula rather than an-i- Otherwise, the calculation proceeds as before. 
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If a > 2, a consequence of ()7.6p is that the probabihty that n A2 for some k but Z^' = / for 
some I <k is 0((log iV)"^). Thus, (fTH)) follows by subtracting ((73^ from ((7^ . Now, consider 
the case a = 1. Let S" be an d-element subset of {2, . . . ,n}. By the argument used to prove 
Lemma 17.51 the probability that, for some 2 < I < k, the events Ai^k and A2^k occur but Z'- = I 
for i G 5 and Z'- = 1 ior i G {2, . . . , n} \ S is 



2 H k-l 



Summing this over d = 1, . . . ,n — 1 and all subsets S of size d, we get that the probability that 
Ai^k and A2^k occur but Z'- = I for i £ S and Z'- = 1 for i £ {2, . . . ,n} \ S for some nonempty 
5 C {2,... ,n} is 



^2 



H fc-l , /n-l 



5EEKErr)-H-Ko^)- <-^' 

Using the probabilistic interpretation of the qi^d,n-i as in Lemma I7.1U1 we have 

^V"-l^ 1 -, (/-l)((n-l) + /-2)! ^ /-I n-l 
2^ , ]qi4,n-i = l-qifi,n-i = l 



, d y-^''"'"-^ + ;_!)! n + l-2 n + l-2 

Thus, (|7.27|) becomes 

„2 '^-i 



(n — l)r 



We get (IZH) by subtracting and (fr2fi|l from □ 

Lemma 7.12. Xei 61, ... ,5^ S (0, 1). Assume that 6 = 5i + ■ ■ ■ + 6n £ (0, 1). T/ien 



6{l-5)<l-l[{l-6n)<d. 



n=l 



Proof. The second inequality follows from | n^=i ^ ~ 11^=1 ~ ^n)\ < J2n=i ^n- To prove the 
first inequality using the second, note that 



N N ,m-\ III X 

1-11(1-'^") = E ( 11(1-'^") -11(1-^")) 

71=1 m=l ^ 71=1 n=l ^ 

m=l ^ 71=1 ^ m=l 



47 



Proof of Lemma [77^ Let B\ = {Zi < A; for i = 1, . . . , n}. Let B2 = {Zi = k ior 1 < i < a and 
Zj < k fov a + I < j < n}. Let = {Zi = 1 for a + 1 < i < n}. We have 

H 

P{B\ n B| n bI for some > 2) = ^ P{B\)P{bI\B\)P{bI\B\ n B^) 



k=2 



e( n Ma-H)"i)(^«,.,„)(n 

k=2 ^l=k+l ^ ^ ^ ^ (=2 



(7.29) 



Using Lemma 17.91 

E[{i-vir] = 



1 + -gi,0,m 

S J S 



1-- +- 



r f{l-l){m + l-2)\ 



(m + l- 1)! 

Therefore, the expression on the right-hand side of 1)7. 29(1 is 

H r H / X -, rk-1 
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s{m + / — 1) 



iE n 1 



nr 



k=2 '-l=k+i 

yH n I r v^fc — 1 



s{n + l- 1) 



n 1 
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(n — a)r 



i^ei - ^ Z^i=fc+i + s 2^1=2 n-a+i-i- -Lnen, 



s(n — a + I — 1) 



2 / ^ \ ^ 2 / ^ 1 \ 2 

^l=k+l 1=2 ^ ^ 1=1 ^ 

Since < C/A; by JUl, we have f Ef=2 'J'%,a,n < C^r^^og i^)^ ^f^2 i < Cr^^Xo^Hf. 

Using Lemma 17.121 the right-hand side of ()7.29|) can be written as 



Qk,a,n- 



H . H k-1 , ^ 

r / ^ nr [n — a)r 



We have 



k=2 ^ 
1 1 



qk,a,n + O 



(log log A^)' 
(logAf)3 



(7.30) 



n-1 



I n+l-1 /(n+/-l) 
,2 H H 



< p. Since qk,a,n ^ C/k, it follows that 
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{n + l-l) I 



^2~Z^ 2^ 

k=2l=k+l ■ " ' " ^=2«=fc+l 

Since qk,a,n ^ C/A;" by 1)7.14(1 . when a > 2 we have 

2 n fc-i H 
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(7.31) 
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(7.32) 



k=2 1=2 1=2 k=l+l 

By combining (|7^ . (f7^ . and (|7^ . we get (fTTIH) when a > 2. When a = 1, note that 



n — a 



Also, note that 

„2 -f^ k-i 
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l] ^ ■p'- It follows that, when a = 1, we have 
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Equations (fT^U]) . (|73T|) . and ((73^ estabhsh (f7?TT|) when a = 1. 



(7.33) 
□ 
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